Skip to main content

Overview

The Sync Operation is a specialized collection mode that finds records whose data has drifted between the source system and Entegrata, and pulls only those changes. Unlike a full refresh (which copies everything) or an incremental load (which relies on a cursor date), sync compares the actual contents of records on both sides and only collects what’s different. Sync is primarily designed for tables that do not have a reliable cursor field (such as LAST_MODIFIED or UPDATED_AT) to support incremental loads. When a table lacks such a field, incremental loads aren’t possible — and running repeated full refreshes can be prohibitively expensive. Sync fills this gap by detecting changes through content comparison instead of a timestamp cursor. Sync operates in two modes — Full Sync and Change Data Load — with the right choice depending on whether your table has a reliable date field.
Sync is a much heavier operation than a regular Full Load or Incremental Load. Only use sync when a table does not have a reliable cursor field for incremental loading. If your table has a dependable LAST_MODIFIED or UPDATED_AT field, an Incremental Load is almost always the better choice.
Sync is currently only available for SQL data sources. Most APIs do not expose the kind of row-level checksum operations needed to perform sync efficiently, so sync is not available for API-based sources.

How Sync Works

At a high level, sync compares “fingerprints” of records between the source and Entegrata, then only pulls the records whose fingerprints don’t match.
1

Group records into buckets

Both the source and Entegrata partition records into buckets. Each bucket holds a subset of rows — for example, one bucket per day, or one bucket per range of primary keys.
2

Compare bucket fingerprints

For each bucket, Entegrata computes a fingerprint (a combined hash of all record data) on both sides. Buckets with matching fingerprints are skipped — the data is already in sync.
3

Drill down into mismatched buckets

When a bucket’s fingerprint differs, sync progressively narrows down to find the smallest subset of records where the difference lives (e.g., Year → Month → Day).
4

Collect the changed records

The records in mismatched buckets are pulled from the source.
5

Apply only the true changes

Entegrata compares each collected record against the version already stored. Only records that are truly new or changed are written — unchanged records in the same bucket are discarded.
Sync never deletes records. If a record has been removed from the source, use the separate Delete Operation to detect and handle it.

Sync Modes

Sync operates in two different modes depending on how it’s scheduled and whether the resource has a date field available.

Full Sync (Standalone Operation)

Full Sync is a standalone, on-demand operation. It runs separately from your normal collection schedule and is typically triggered manually or on a less-frequent cadence (e.g., weekly). Full Sync uses the resource’s primary key (or unique identifier) to group records into buckets and compares the entire table across the source and Entegrata. Best for:
  • Tables without a reliable cursor field AND without a reliable date field
  • Periodic verification that a full table is in sync
  • Reconciling after an outage or incident
Full Sync uses a hash of the primary key to assign each record to a bucket. This means a record always falls in the same bucket, regardless of how its values change — keeping the comparison stable.

Change Data Load (Scheduled Load Type)

Change Data Load is a load type (selected on the resource’s Collection Settings page, alongside Full Load and Incremental Load). Unlike standalone sync, Change Data Load runs on the resource’s normal collection schedule — every time the resource collects, it performs a sync instead of a traditional full or incremental load. Change Data Load requires a date field on the resource (e.g., created_date, transaction_date) and groups records by Year → Month → Day. This is the most efficient sync mode because changes in real-world data tend to cluster around recent dates. Best for:
  • Tables without a reliable cursor field for incremental loading, but WITH a stable date field
  • Large historical tables where most records are older and unlikely to change
  • Scheduled catch-up on drift (configured once, runs automatically)
How it narrows down:
LevelWhat it compares
YearAll records in each year
MonthWithin mismatched years, compares each month
DayWithin mismatched months, compares each day
If years 2020–2023 match between source and Entegrata, those entire years are skipped — no month or day level work is needed for them. This makes Change Data Load extremely efficient for large historical tables.
To use Change Data Load, open the resource’s Collection Settings, choose Change Data Load as the load type, and select a date field. See Collection Settings for details.

When To Use Sync

Sync is designed for tables where Incremental Load is not viable because the table lacks a reliable cursor field.

Decision guide

Table has a reliable cursor field (UPDATED_AT, LAST_MODIFIED)?Has a reliable date field (e.g., created_date)?Recommended approach
YesIncremental Load (sync is unnecessary)
NoYesChange Data Load (scheduled sync with date-based drill-down)
NoNoFull Sync (standalone, run on-demand) + Full Load on a less frequent cadence

Typical scenarios

Tables without cursor fields

When a SQL table has no UPDATED_AT or LAST_MODIFIED field, use Change Data Load or Full Sync instead of repeated full refreshes.

After outages

Run Full Sync after a source system outage or Entegrata downtime to reconcile any gaps.

Large historical tables

Use Change Data Load on large tables with a stable date field — only recent partitions get compared in depth.

Verification

Run Full Sync before important downstream processes to verify the data is fully in sync.

When NOT to use sync

  • Tables with a reliable cursor field — use Incremental Load instead; it’s dramatically lighter
  • Small tables — a Full Load is simpler and not much more expensive
  • Tables that change constantly across all records — most buckets will be flagged, and sync provides little benefit over a full refresh
  • API-based data sources — sync is only available for SQL data sources

What Sync Does vs. Doesn’t Do

Detects changed records

Sync finds records whose values differ between source and Entegrata

Detects new records

Records that exist in the source but not in Entegrata are collected

Handles ALL data fields

Sync compares every column — not just the cursor field — so it catches updates that incremental loads would miss

Only writes true changes

Unchanged records that happen to share a bucket with changed ones are discarded before writing

Does NOT detect deletes

Records removed from the source require the separate Delete Operation

Does NOT replace normal loads

Sync runs in addition to Full Load or Incremental Load — it does not replace them

Performance Characteristics

Sync is designed to be efficient on large tables by comparing fingerprints rather than pulling data upfront.
Table sizeWhat sync does
Few changesOnly a handful of buckets are flagged; very fast
Many changesMore buckets are flagged; sync may approach the cost of a full refresh
No changesSync completes quickly after the first comparison round with zero data pulled
For the best performance, use Change Data Load sync on tables with a date field. The Year → Month → Day drill-down means only the parts of the table with actual changes are examined closely — unchanged years are skipped entirely.

Limitations

A few data types and scenarios are not fully supported by sync:
  • Binary columns (VARBINARY, BINARY, IMAGE) are excluded from the comparison
  • Non-ASCII Unicode text in certain string columns may produce false positives; this is not an issue for data containing only standard characters
  • Extremely large tables with widespread changes — if most of the table has drifted, sync will flag most buckets and the operation becomes similar in cost to a full refresh

Collection Settings

Configure load types, schedules, and filters for resources

Triggering Collection

Manually run collection jobs on demand

Monitoring Jobs

Track sync execution and performance

Troubleshooting

Resolve resource collection issues