Skip to main content

Documentation Index

Fetch the complete documentation index at: https://support.entegrata.com/llms.txt

Use this file to discover all available pages before exploring further.

What is the Collector?

The Collector is Entegrata’s core data ingestion system that automatically connects to your data sources, discovers available data, and synchronizes it to your data lakehouse. The Collector manages the entire data pipeline from source systems to your analytics-ready tables.

Key Concepts

Connections

A Connection represents Entegrata’s link to a data source. Each connection includes:
  • Authentication credentials - Secure access to your data source
  • Collection schedule - How often data is synchronized
  • Connection-level settings - Default collection behaviors for all resources

Resources

A Resource is a table, view, API endpoint, or file within a connection that Entegrata collects. Each resource can have:
  • Individual collection settings - Override connection-level defaults
  • Load type - Full load or incremental synchronization
  • Unique keys - Fields that identify unique records
  • Filters - Rules to limit what data is collected

Resource Hierarchy

Some resources have nested sub-resources (child resources). For example:
  • A database table might have related child tables
  • An API endpoint might have nested data structures
  • A file might contain multiple sheets or sections
Sub-resources inherit collection settings from their parent and cannot be scheduled independently.

Jobs

A Job represents a single collection execution for a connection or resource. Jobs track:
  • Execution status (Running, Completed, Failed, Scheduled)
  • Records collected and processing speed
  • Duration and performance metrics
  • Error details if the job failed

Discovery

Discovery is the automated process where Entegrata:
  1. Connects to your data source
  2. Scans for available resources (tables, views, endpoints, files)
  3. Analyzes schema and structure
  4. Detects changes to existing resources
Discovery runs automatically every 3 hours and can also be triggered manually.

Collection Lifecycle

1

Connect

Add a new connection by providing authentication credentials and connection details
2

Discover

Entegrata automatically discovers all available resources in your data source
3

Configure

Set up collection schedules, load types, and filters for your resources
4

Collect

Data is automatically synchronized according to your configured schedules
5

Monitor

Track job status, performance, and data quality through the admin portal

Collection Schedules

Entegrata supports two scheduling approaches:

Interval-Based Scheduling

Collections run on a regular time interval (e.g., every 6 hours, daily at 2 AM). This is the most common approach for routine data synchronization.

Load Types

Full Load

Copies all data from the source every time. Use when:
  • Source data is small
  • Historical changes aren’t tracked
  • You need a complete snapshot each time

Incremental Load

Copies only new or changed data since the last collection. Requires:
  • An incremental load field (like modified_date or updated_at)
  • The source system tracks when records change
Incremental loads are faster and more efficient for large datasets.

Time-Window Incremental Load

Works like an Incremental Load, but re-scans a bounded time window on the cursor field every run instead of picking up where the last run left off. The window can be:
  • Relative — e.g., the last 3 months, sliding forward with each run
  • Absolute — a fixed start and end date (useful for backfills of a specific period)
Use when the cursor field exists but isn’t fully reliable — for example, when modified_date doesn’t always update on every change. Re-scanning the window each run catches records whose cursor was set late or not at all, without the cost of a Full Load.

Collection Status

Connections and resources can be:
  • Active - Currently collecting data according to schedule
  • Inactive - Paused, not collecting data
You can toggle status to temporarily stop collection without deleting configuration.

Getting Started

Managing Connections

Learn how to add, update, and delete data source connections

Managing Resources

Configure resources, set schedules, and manage collection settings

Monitoring Jobs

Track collection progress and troubleshoot issues

Discovery

Understand how Entegrata discovers and tracks your data