Skip to main content

Overview

Data sources are the foundation of your data pipelines. In Entegrata, you connect to various data sources, select the specific tables or datasets you need, and configure how they relate to each other. This guide covers everything you need to know about working with data sources in the mapping editor.

Understanding Source Types

Primary Source

The primary source is the main source of data for your entity:
  • Determines the base set of records
  • All other sources are joined to this source
  • Each entity must have exactly one primary source
  • The first source you add becomes the primary source
Example: For a Customer entity, your CRM system’s customer table would be the primary source.
Primary source node labeled on canvas
Related sources are additional data sources joined to enrich your primary data:
  • Can have multiple related sources per entity
  • Joined using one or more key fields (foreign key relationships)
  • Can come from different systems or databases
Example: For a Customer entity, you might join:
  • Client table (to get customer client number)
  • Marketing engagement table (to get campaign responses)
Canvas showing primary source with multiple related sources

Adding a Primary Source

1

Open the Mapping Editor

Navigate to your entity in the data pipeline section and open the mapping editor.
Navigation to mapping editor
2

Open Source Context Panel

Click anywhere in the editor that isn’t a node or connection to open the source catalog context panel.
3

Select Source

Search and choose a source from your connected sources:
  • Database connections
  • File uploads
  • API connections
  • External systems
Source catalog context panel
4

Configure source identifier

The first source added becomes your primary source automatically. This is the main source of data for your entity.
The primary source determines the base records. All other sources are joined to this.
Configure the identifier used as primary means of identifying unique records for this source across all potentially related sources.
  • If a primary key is setup for the source, it will be the default
  • You can override the identifier to any field on the source you choose
Source catalog context panel
Related sources are joined to your primary source to enrich data.
1

Open Source Context Panel

Click anywhere in the editor that isn’t a node or connection to open the source catalog context panel.
2

Select Source

Search and choose a source from your connected sources:
  • Database connections
  • File uploads
  • API connections
  • External systems
Source catalog context panel
3

Configure Identifiers

Any source added after the first becomes a related source.
Related sources need logic defined to join to the primary source by one or more of its source’s fields.
Configure the primary key for the source if it isn’t already setup from its source. This is required to uniquely identify records from the source system.Configure how data from this source is related to the primary source using data from both sources’ fields.
  • Select a field on this source whose data uniquely relates to data in the primary source.
  • Select the field on the primary source whose data matches the selected identifier for this source.
Source identifier configuration
Join Best Practices
  • Use exact match joins when possible (e.g., ID fields)
  • Our system preserves all primary source records, but not necessarily all related source records.
  • For complex or ambiguous cases, multiple fields can be used in combination to uniquely identify records.

Configuring Source Properties

Click on any source node to access its properties panel.

Basic Properties

  • Connection: The data connection being used (read-only)
  • Table/Dataset: The specific table or dataset (read-only)
  • Primary Key: The field(s) used to uniquely identify a record in the source system.
  • Entegrata Identifier: The field(s) used to join records across other data sources in the Entegrata system.
Basic properties panel
When working with many related sources:

Nested Joins

Related sources can join to other related sources (not just the primary):
1

Add First Related Source

Join a related source to your primary source.
2

Add Second Related Source

When configuring the join, select the first related source instead of the primary source.
3

Configure Join

Set up the join condition between the two related sources using the identifiers configuration.
Example:
  • Primary: Customers
  • Related 1: Orders (joined to Customers)
  • Related 2: OrderItems (joined to Orders, not Customers)

Troubleshooting

Source Connection Failed

Issue: Cannot connect to the data source. Solutions:
  • Verify the connection credentials are current
  • Check network connectivity
  • Ensure you have permission to access the source
  • Contact your administrator to refresh the connection

No Tables Visible

Issue: Can’t see any tables or datasets in the source. Solutions:
  • Verify you have read permissions
  • Check if you’re looking in the correct schema/database
  • Refresh the connection in the admin portal
  • Some sources may require specific catalog configuration

Too Many Records After Join

Issue: Join produces more records than expected. Solutions:
  • Check for duplicate values in join keys
  • Verify you’re joining on the correct fields
  • Look for one-to-many relationships
  • Add additional join conditions to make relationship unique
  • Consider if this is actually correct (e.g., one customer, many orders)

Best Practices

Source Configuration
  • Always use filters at the source level when possible
  • Choose the most specific table/view available
  • Ensure you are joining to other tables with at most 1 record for each primary source record.
Data Quality
  • Verify join keys have good cardinality
  • Check for nulls in join fields
  • Test edge cases (no matches, duplicates)
  • Validate against expected record counts