Introduction
The Entegrata Pipeline system enables you to create, configure, and manage data pipelines that transform and load data from your Collector sources into the Entegrata platform. Pipelines provide a powerful framework for defining how your source data maps to standardized data types, allowing you to test configurations before deploying them to production.Pipelines are the foundation of your data integration strategy. They define the transformation logic that converts raw data from your sources into structured, queryable data types.
What are Pipelines?
Pipelines in Entegrata serve two primary functions:- Data Orchestration - Define when and how your data processing jobs execute
- Data Mapping - Configure the transformation logic that maps source data to canonical data types
- Which data sources to pull from
- How fields map between source and target schemas
- Transformation logic for data cleansing and enrichment
- Default values and data quality rules
Key Features
Pipeline Management
Create, edit, duplicate, and delete pipelines with full version control and audit history
Visual Mapping Editor
Use the intuitive drag-and-drop interface to map source fields to data type fields
Test Before Deploy
Run pipelines in test mode to validate mappings without affecting production data
Scheduled Execution
Configure pipelines to run automatically on schedules or trigger them manually
Multi-Source Support
Map data from multiple sources into a single data type with primary and related sources
Advanced Transformations
Use COALESCE, CONCAT, CASE statements, and custom SQL for complex field mappings
Pipeline Workflow
The typical pipeline workflow follows these stages:Map Fields
Use the visual editor to map source fields to data type fields, applying transformations as needed
Pipeline vs. Data Mapping
It’s important to understand the relationship between pipelines and data mappings:- Pipeline - The container that organizes and schedules data processing jobs. A pipeline can include multiple data type mappings.
- Data Mapping - The specific configuration that maps one or more source tables to a single data type (entity).
Getting Started
Pipeline Management
Learn how to create and manage pipelines
Data Mapping
Explore the visual mapping editor and field configuration
Creating Pipelines
Step-by-step guide to creating your first pipeline
Running Pipelines
Execute and monitor pipeline jobs
