Overview
The Sync Operation is a specialized collection mode that finds records whose data has drifted between the source system and Entegrata, and pulls only those changes. Unlike a full refresh (which copies everything) or an incremental load (which relies on a cursor date), sync compares the actual contents of records on both sides and only collects what’s different. Sync is primarily designed for tables that do not have a reliable cursor field (such asLAST_MODIFIED or UPDATED_AT) to support incremental loads. When a table lacks such a field, incremental loads aren’t possible — and running repeated full refreshes can be prohibitively expensive. Sync fills this gap by detecting changes through content comparison instead of a timestamp cursor. Sync operates in two modes — Full Sync and Change Data Load — with the right choice depending on whether your table has a reliable date field.
Sync is currently only available for SQL data sources. Most APIs do not expose the kind of row-level checksum operations needed to perform sync efficiently, so sync is not available for API-based sources.
How Sync Works
At a high level, sync compares “fingerprints” of records between the source and Entegrata, then only pulls the records whose fingerprints don’t match.Group records into buckets
Both the source and Entegrata partition records into buckets. Each bucket holds a subset of rows — for example, one bucket per day, or one bucket per range of primary keys.
Compare bucket fingerprints
For each bucket, Entegrata computes a fingerprint (a combined hash of all record data) on both sides. Buckets with matching fingerprints are skipped — the data is already in sync.
Drill down into mismatched buckets
When a bucket’s fingerprint differs, sync progressively narrows down to find the smallest subset of records where the difference lives (e.g., Year → Month → Day).
Sync Modes
Sync operates in two different modes depending on how it’s scheduled and whether the resource has a date field available.Full Sync (Standalone Operation)
Full Sync is a standalone, on-demand operation. It runs separately from your normal collection schedule and is typically triggered manually or on a less-frequent cadence (e.g., weekly). Full Sync uses the resource’s primary key (or unique identifier) to group records into buckets and compares the entire table across the source and Entegrata. Best for:- Tables without a reliable cursor field AND without a reliable date field
- Periodic verification that a full table is in sync
- Reconciling after an outage or incident
Full Sync uses a hash of the primary key to assign each record to a bucket. This means a record always falls in the same bucket, regardless of how its values change — keeping the comparison stable.
Change Data Load (Scheduled Load Type)
Change Data Load is a load type (selected on the resource’s Collection Settings page, alongside Full Load and Incremental Load). Unlike standalone sync, Change Data Load runs on the resource’s normal collection schedule — every time the resource collects, it performs a sync instead of a traditional full or incremental load. Change Data Load requires a date field on the resource (e.g.,created_date, transaction_date) and groups records by Year → Month → Day. This is the most efficient sync mode because changes in real-world data tend to cluster around recent dates.
Best for:
- Tables without a reliable cursor field for incremental loading, but WITH a stable date field
- Large historical tables where most records are older and unlikely to change
- Scheduled catch-up on drift (configured once, runs automatically)
| Level | What it compares |
|---|---|
| Year | All records in each year |
| Month | Within mismatched years, compares each month |
| Day | Within mismatched months, compares each day |
When To Use Sync
Sync is designed for tables where Incremental Load is not viable because the table lacks a reliable cursor field.Decision guide
Table has a reliable cursor field (UPDATED_AT, LAST_MODIFIED)? | Has a reliable date field (e.g., created_date)? | Recommended approach |
|---|---|---|
| Yes | — | Incremental Load (sync is unnecessary) |
| No | Yes | Change Data Load (scheduled sync with date-based drill-down) |
| No | No | Full Sync (standalone, run on-demand) + Full Load on a less frequent cadence |
Typical scenarios
Tables without cursor fields
When a SQL table has no
UPDATED_AT or LAST_MODIFIED field, use Change Data Load or Full Sync instead of repeated full refreshes.After outages
Run Full Sync after a source system outage or Entegrata downtime to reconcile any gaps.
Large historical tables
Use Change Data Load on large tables with a stable date field — only recent partitions get compared in depth.
Verification
Run Full Sync before important downstream processes to verify the data is fully in sync.
When NOT to use sync
- Tables with a reliable cursor field — use Incremental Load instead; it’s dramatically lighter
- Small tables — a Full Load is simpler and not much more expensive
- Tables that change constantly across all records — most buckets will be flagged, and sync provides little benefit over a full refresh
- API-based data sources — sync is only available for SQL data sources
What Sync Does vs. Doesn’t Do
Detects changed records
Sync finds records whose values differ between source and Entegrata
Detects new records
Records that exist in the source but not in Entegrata are collected
Handles ALL data fields
Sync compares every column — not just the cursor field — so it catches updates that incremental loads would miss
Only writes true changes
Unchanged records that happen to share a bucket with changed ones are discarded before writing
Does NOT detect deletes
Records removed from the source require the separate Delete Operation
Does NOT replace normal loads
Sync runs in addition to Full Load or Incremental Load — it does not replace them
Performance Characteristics
Sync is designed to be efficient on large tables by comparing fingerprints rather than pulling data upfront.| Table size | What sync does |
|---|---|
| Few changes | Only a handful of buckets are flagged; very fast |
| Many changes | More buckets are flagged; sync may approach the cost of a full refresh |
| No changes | Sync completes quickly after the first comparison round with zero data pulled |
Limitations
A few data types and scenarios are not fully supported by sync:- Binary columns (
VARBINARY,BINARY,IMAGE) are excluded from the comparison - Non-ASCII Unicode text in certain string columns may produce false positives; this is not an issue for data containing only standard characters
- Extremely large tables with widespread changes — if most of the table has drifted, sync will flag most buckets and the operation becomes similar in cost to a full refresh
Related Topics
Collection Settings
Configure load types, schedules, and filters for resources
Triggering Collection
Manually run collection jobs on demand
Monitoring Jobs
Track sync execution and performance
Troubleshooting
Resolve resource collection issues
