A delta load (or incremental load) refers to extracting only the data that has changed since the last time the extract process has run. This process is typically query-based and requires an incrementing id or timestamp column that can be used to determine new records. ^overview-delta-load
```mermaid
%%{init: { "flowchart": { "useMaxWidth": true } } }%%
graph TD
subgraph S1 [Initial Load]
direction LR
A1[(Source<br/>100,000 records)] -->|Extract all records| B1[Ingestion Process]
B1 -->|Load all records| C1[(Destination<br/>100,000 records)]
end
subgraph S2 [Subsequent Runs]
direction LR
A2[(Destination<br/> 100,000 records)] -->|"Query for MAX(modified_at)<br/>from Destination"| D[Latest Timestamp]
D -->|"Query source using timestamp to filter"|A[(Source<br/>100,500 records)]
A -->|Load 500 new/changed records| B[Ingestion Process]
B --> C[(Destination<br/>100,500 records)]
end
S1 --> S2
```
^overview-delta-load-diagram
The most commonly used steps to perform a delta load are:
1. Ensure there is a `modified_at` timestamp or incremental id column such as a primary key on the data source.
2. On the initial run of the pipeline, do a full load of the dataset.
3. On following runs of the pipeline, query the target dataset using `MAX(column_name)`.
4. Query the source dataset and filter records where values are greater than the value from step 3.
## Delta Load Advantages
- More resource efficient
- Easy to implement and maintain
- Only requires read permissions to perform
## Delta Load Disadvantages
- Does not capture deleted records
- Requires extra metadata on the source (commonly a unique id or updated timestamp)
- Does not capture multiple changes between the polling interval. If a row changes multiple times, you may only capture the latest state.
- Querying the database for changes may impact the database performance.
%% wiki footer: Please don't edit anything below this line %%
## This note in GitHub
<span class="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Concepts/Data%20Ingestion/Delta%20Load.md "git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Concepts/Data%20Ingestion/Delta%20Load.md "git-hub-copy-note")</span>
<span class="git-footer">Was this page helpful?
[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Concepts/Data%20Ingestion/Delta%20Load) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Concepts/Data%20Ingestion/Delta%20Load)</span>