Marketing Automation, Salesforce Marketing Cloud, SFMC Tips & Tricks

Extract data deltas from the data extension

Extract data deltas from the data extension

Exporting data deltas from a marketing automation platform like Salesforce Marketing Cloud has been a widely discussed topic. I’ve seen numerous approaches to tackling this challenge. In this post, we’ll explore various methods for extracting data deltas and delve into my personal favorite. Stay tuned to uncover practical strategies that could revolutionize your data management processes, or not.

Extracting data deltas of previous day, week

This scenario, which is easily set up, has the objective of processing only data created in the previous day or week. The automation is configured to run daily or weekly, systematically collecting data from the preceding period, precisely from midnight to midnight. We can configure it effortlessly, ensuring there are no problems with duplicates or missing record gaps. I’ll discuss later how “gap issue” can cause some records to go missing. There’s just one minor issue that might lead you to consider a different implementation approach. This arises if there’s a downtime in the automation that exceeds one day. In such instances, you would need to adjust the deltas to be selected from the current day minus two, instead of one. Another minor concern, especially with SFMC, is that all timestamps are in CST. This discrepancy could result in data being missed or experiencing a two-day delay if the automation is run in a local timezone where it’s already a new day, but in CST, it’s still considered ‘yesterday’. It’s crucial to ensure your automation runs at a time when CST has also transitioned past midnight.

I found that the best solution for this case is an SQL activity utilizing the DATEDIFF function in MS SQL Server. DATEDIFF efficiently calculates the difference between two dates, making it ideal for this application.

--DATEDIFF ( datepart , startdate , enddate )

SELECT 
some_delta
FROM
sendlog_table sl
WHERE 
 DATEDIFF(  
            DAY,
            sl.Date,
            SYSDATETIME()
         ) = 1

When extracting data with dates in different time zones, it’s crucial to adjust for these differences. I found the best approach is to use the DATEDIFF function in MS SQL Server, combined with time zone conversion functions. This ensures accurate date comparisons.

SELECT 
some_delta
FROM
sendlog_table sl
WHERE 
 DATEDIFF(
            DAY,
            sl.Date AT TIME ZONE 'Central America Standard Time' AT TIME ZONE 'UTC',
            SYSDATETIME() AT TIME ZONE 'Central America Standard Time' AT TIME ZONE 'UTC'
        ) = 1

Next, I will demonstrate how to add a configuration file and retrieve the ‘lastRun’ timestamp from it, which will serve as our time restriction.

Extracting data deltas using configuration file

I have seen couple of implementations where the extract automation run based on the record found in configuration and its lastRun date. This SQL could look like:

SELECT 
some_delta
FROM
sendlog_table sl
INNER JOIN configuration_table_lastRun ct on ct.automationName = 'my_automation_name'
WHERE 
sl.date >= ct.lastRun

Or another way is to use subquery

SELECT 
some_delta
FROM
sendlog_table sl
WHERE 
sl.date >= (SELECT TOP 1 lastRun FROM configuration_table_lastRun WHERE automationName = 'my_automation_name')

In summary, the first query is generally preferable due to its simplicity and potentially better performance. In our case, performance won’t be an issue as the table might only consist of a few records.

The last step of the automation will be updating the “lastRun” date in our configuration_table_lastRun data extension. Now, some of you might be thinking about what happens with records that were created during the automation had started and the automation had finished on the very next run?

In scenarios where there is a time gap between our first activity and the last, wherein we save the ‘lastRun’ to the configuration table, we could encounter a “gap issue”. This could potentially lead to the loss of some records. I have experienced gaps ranging from a few minutes to several hours. These occurred when the delta query had to be divided into multiple SQL activities due to the extensive volume of records resulted in activity timeouts.

To better illustrate the “Gap issue” (If you understood you can skip following example):

  1. Automation Process: Imagine you have an automated system that regularly processes data from a database. This process is scheduled to run at specific intervals – say every hour – and updates the ‘lastRun’ timestamp in a configuration table once it completes.
  2. The Gap Issue: The gap occurs in the period between the start and the completion of an automation run. Here’s how:
    • Start Time: The process begins, let’s say, at 1:00 PM. It notes the ‘lastRun’ timestamp (for example, 12:00 PM from the previous run).
    • Data Generation: During the process (from 1:00 PM onwards), new data entries are continuously being added to the database.
    • End Time: The process finishes at 1:30 PM and updates the ‘lastRun’ timestamp to 1:30 PM.
  3. Potential Data Loss: The problem is with the data that was added between 1:00 PM and 1:30 PM. Since the next run of the process will start by looking at the ‘lastRun’ timestamp (now 1:30 PM), it might miss processing the data entered during that 30-minute window.
  4. Why It Happens:
    • Long Processing Time: If the data volume is huge, the process might take a long time, widening this gap.
    • Split Queries: In cases where you have to split your SQL queries to manage large datasets, this can further complicate synchronizing the data, increasing the risk of missing records that were added while the previous queries were running.

In summary, the gap in records is a synchronization issue in data processing automation, where records added during an ongoing process might not be captured in the subsequent process due to the timing mismatch.

Is there a way to mitigate our “gap issue”? Indeed, I might have a solution. Let’s examine it next.

Extracting data deltas using configuration file v2

To resolve our gap issue, we need to synchronize our ‘lastRun’ timestamps accurately in two places:

  • At the beginning of the automation in our SQL select.
  • In the configuration file when the automation finishes.

We will add an additional field to our ‘last run’ data extension, designated to store the actual date when the automation started. This date will be employed both as a boundary setter for our selection process and as the ‘lastRun’ date for the next execution of our automation.

The initial step in our automation process involves updating the ‘actualRun’ field with the current date. This date will then be used in the subsequent stages of our automation to establish the upper boundary for our queries.

SELECT
GetDate() as actualRun,
automationName
FROM
configuration_table_lastRun
WHERE
automationName = 'SendLog_Archive' 

In our actual SQL query activity that select delta data we will amend following. You can use either inner join or subquery which ever you like.

SELECT 
some_delta
FROM
sendlog_table sl
WHERE 
sl.date >= (SELECT TOP 1 lastRun FROM configuration_table_lastRun WHERE automationName = 'SendLog_Archive')
and 
sl.date < (SELECT TOP 1 actualRun FROM configuration_table_lastRun WHERE automationName = 'SendLog_Archive')

And last but not least we will update our lastRun in configuration_table_lastRun data extension and if needed we can add lastRunT_1 (t-1) timestamp

SELECT
(SELECT TOP 1 actualRun FROM configuration_table_lastRun WHERE automationName = 'SendLog_Archive') AS lastRun,
automationName,
(SELECT TOP 1 lastRun FROM configuration_table_lastRun WHERE automationName = 'SendLog_Archive') AS lastRunT_1
FROM
configuration_table_lastRun
WHERE
automationName = 'SendLog_Archive'

Configuration data extension is capable of storing ‘lastRun’ and ‘actualRun’ timestamps for numerous automations that extract data deltas.

Oh hi there 👋
I have a FREE e-book for you.

Sign up now to get an in-depth analysis of Adobe and Salesforce Marketing Clouds!

We don’t spam! Read our privacy policy for more info.

Marcel Szimonisz
Marcel Szimonisz
MarTech consultant As a marketing automation consultant, I specialize in problem-solving, process automation, and driving innovation for clients' marketing platforms.

I hold certifications in Adobe Campaign v6 (3x certified) and Salesforce Marketing Cloud (5x certified), as well as 1x Salesforce Associate certified.

Moreover, I serve as a community advisor for Adobe Campaign, providing expert insights and guidance.

Beyond my professional pursuits, I explore various programming languages, CMSs, and frameworks, enhancing my technical expertise and staying at the forefront of industry advancements.
Take a look at our subscription offering in case you are looking for a Marketing Automation Consultant.

Similar posts that you may find useful

An interactive whiteboard in an educational setting displays a query expression builder, highlighting the use of variables in a complex query. A group of engaged learners discuss the color-marked variables. The room is equipped with data analysis and programming resources, enhancing the learning atmosphere.
ACC Tips & Tricks, Adobe Campaign, Marketing Automation

Use variables in query expression builder

2 minutes read

When the time comes, you will find yourself in a situation where you would like to set up query builder condition expressions as dynamic values. We can achieve this and more with Adobe Campaign Classic. But what are those situations? For example, I had to build a reporting workflow for automated campaigns that was divided […]

Continue reading
Adobe Campaign post
Adobe Campaign, Marketing Automation

Implementing DKIM in adobe campaign

2 minutes read

Have you ever wondered how to implement DKIM in Adobe Campaign Classic, look no further here is how you can do it. Implementing DKIM for Adobe Campaign Classic, gave many hard times as there is no official documentation you can follow step by step. But nothing is lost, it is easier than you may think. […]

Continue reading
SFMC tips and tricks
Marketing Automation, Salesforce Marketing Cloud, SFMC Tips & Tricks

Validation issues With Data Extension Columns _dat, _call, _from

less than a minute read

Data extension column names with _dat, _call, _from, _join suffix will fail to validate, due to using old style JOIN The fix is really easy you only need to wrap the column name like [test_from] and the problem with validation is history If you know more suffixes that will throw an error let me know

Continue reading
How to JavaScript in SFMC
Marketing Automation, Salesforce Marketing Cloud

JavaScript in Salesforce Marketing Cloud

3 minutes read

Salesforce Marketing Cloud uses JavaScript where a advanced customization is needed e.g. automations, cloud pages and even in message personalization. Last time we discussed how to JavaScript in Adobe Campaign, we discovered that it utilizes an older version of ECMAScript. However, it’s worth noting that Salesforce Marketing Cloud (SFMC) goes even further back and employs […]

Continue reading
Salesforce Marketing Cloud Tips
Marketing Automation, Salesforce Marketing Cloud, SFMC Tips & Tricks

Proof email was previewed but not received

2 minutes read

There are many possible issues, and I will try to list all those I have come across during my times when I wondered where my email is. Contact is unsubscribed If the contact you are trying to preview an email with is on one of the global unsubscribe lists, or has unsubscribed or bounced status […]

Continue reading