Skip to main content

Introduction

The Entegrata Pipeline system enables you to create, configure, and manage data pipelines that transform and load data from your Collector sources into the Entegrata platform. Pipelines provide a powerful framework for defining how your source data maps to standardized data types, allowing you to test configurations before deploying them to production.
Pipelines are the foundation of your data integration strategy. They define the transformation logic that converts raw data from your sources into structured, queryable data types.

What are Pipelines?

Pipelines in Entegrata serve two primary functions:
  1. Data Orchestration - Define when and how your data processing jobs execute
  2. Data Mapping - Configure the transformation logic that maps source data to canonical data types
Each pipeline contains one or more data type mappings that specify:
  • Which data sources to pull from
  • How fields map between source and target schemas
  • Transformation logic for data cleansing and enrichment
  • Default values and data quality rules

Key Features

Pipeline Management

Create, edit, duplicate, and delete pipelines with full version control and audit history

Visual Mapping Editor

Use the intuitive drag-and-drop interface to map source fields to data type fields

Test Before Deploy

Run pipelines in test mode to validate mappings without affecting production data

Scheduled Execution

Configure pipelines to run automatically on schedules or trigger them manually

Multi-Source Support

Map data from multiple sources into a single data type with primary and related sources

Advanced Transformations

Use COALESCE, CONCAT, CASE statements, and custom SQL for complex field mappings

Pipeline Workflow

The typical pipeline workflow follows these stages:
1

Create Pipeline

Set up a new pipeline with a name, description, and trigger configuration
2

Add Data Types

Define which data types (entities) this pipeline will process
3

Configure Sources

Connect to your Collector data sources and select the tables or resources to use
4

Map Fields

Use the visual editor to map source fields to data type fields, applying transformations as needed
5

Test Execution

Run the pipeline in dry-run mode to validate your mappings and catch errors
6

Deploy

Publish your pipeline to production, generating optimized DLT scripts
7

Monitor

Track pipeline execution status, view logs, and troubleshoot issues

Pipeline vs. Data Mapping

It’s important to understand the relationship between pipelines and data mappings:
  • Pipeline - The container that organizes and schedules data processing jobs. A pipeline can include multiple data type mappings.
  • Data Mapping - The specific configuration that maps one or more source tables to a single data type (entity).
Think of a pipeline as a project that groups related data mappings together. For example, you might have a “Client Data Processing Pipeline” that includes mappings for Client, Contact, and Address data types.

Getting Started

Common Use Cases

Daily Client Data Refresh

Create a scheduled pipeline that runs nightly to pull updated client information from your CRM, mapping it to standardized Client and Contact data types.

Historical Data Migration

Build a one-time pipeline to migrate legacy data, using complex field mappings to handle data quality issues and schema differences.

Multi-System Integration

Configure pipelines that combine data from multiple sources (e.g., CRM, ERP, billing systems) into unified data types for comprehensive reporting.

Incremental Updates

Set up event-driven pipelines that process only new or changed records, ensuring your Entegrata data stays current with minimal processing overhead.

Next Steps

Ready to start building pipelines? Choose your path: