Pipeline orchestration

Your pipelines. Built, tested, and deployed from one place.

Declarative YAML or a visual DAG editor. Incremental materialization, event triggers, backfills, and SLA alerts — all first-class.

Start free See example pipeline

dag: orders_daily.yaml

schedule: every 1hlast run: 2m agoSLA: 12m p95retries: 2

// the problem

Airflow wasn't built for data, dbt stops at SQL, Prefect wants you to write Python.

Most pipeline tools force a tradeoff. Orchestrators treat every task like an opaque shell command — they don't know a schema from a socket. SQL-only frameworks skip the ingest, triggers, and backfills that real pipelines need. Python-first runners hand you a blank file and call that flexibility.

DXData starts the other direction: a declarative pipeline format that understands tables, partitions, and time. Visual when you want it, code-first when you need it, and consistent all the way from laptop to production.

Declarative YAML

One manifest describes sources, transforms, schedule, tests, and SLA. Reviewable in a pull request.

Visual DAG editor

Drag-and-drop graph that round-trips to the same YAML. Product folks sketch, engineers review the diff.

Incremental by default

Watermarks are tracked for you. Only new partitions are recomputed — backfills stay cheap.

pipelines/daily_kpis.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# pipelines/daily_kpis.yaml
name: daily_kpis
owner: data-platform
schedule: '0 * * * *'  # hourly
retries:
  max: 3
  backoff: exponential
sla:
  freshness: 30m
  runtime_p95: 12m
steps:
  - name: ingest
    source: postgres.orders
    mode: incremental
    watermark: updated_at
  - name: aggregate
    sql: ./daily_kpis.sql
    materialize: merge
  - name: publish
    sink: bi.daily_kpis
3 steps · compile: ok
schedule: hourly

// pipelines.as_code

YAML-first. Visual whenever you want it.

Every pipeline is a single declarative manifest. It lives in your repo, renders a diff on every pull request, and is reviewed the same way as any other code. The visual DAG editor writes the same format, so designers and engineers share a source of truth.

Because the manifest is plain data, you get all the usual tooling for free: search, lint, codegen, and AI review. No bespoke DSL, no proprietary bundle.

Pipelines-as-code — diff, review, and revert like any other file
Round-trips with the visual editor — no lossy serialization
Linting and type-checking at compile time, not 3 a.m.

incremental: events -> daily_kpis

source: eventswatermark: 2026-04-18T23:59Z

only new rows processed

target: daily_kpis · merge on (day, metric) · +214 rows

day

metric

value

src_rows

ingested

// materialize.incremental

Only the new rows. Every time.

Every step tracks a watermark on an ordered column (timestamps, log sequence numbers, Iceberg snapshots). When the pipeline runs, it pulls only rows past the last successful watermark — and idempotently upserts them into the target.

Choose the materialization mode per step: merge on a key, append new partitions, or occasionally do a full-refresh on a schedule of your choosing. No custom Python required.

Merge, append, or full-refresh modes with per-step overrides
Watermarks persisted transactionally — runs are resumable, not lossy
Partition-aware writes that minimize compaction overhead

triggers/orders_ingest.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# triggers/orders_ingest.yaml
triggers:
  - type: cron
    expr: '*/15 * * * *'
  - type: event
    source: s3
    bucket: raw-orders
    match: 'orders/*.parquet'
  - type: webhook
    auth: hmac
cascade:
  - clean_orders
  - fraud_features
  - bi_refresh

trigger: s3.put orders/*

1 event · 3 cascades · paralleldedupe window: 5m

// triggers.cron_event_webhook

Run on a clock, an event, or a handshake.

Cron is the floor, not the ceiling. Trigger pipelines from S3 object events, HTTP webhooks, Kafka topics, or upstream pipeline completions. Manual runs are first-class — no separate "run now" console.

Cascade triggers let one upstream fan out to dozens of consumers without anyone wiring up sensors. Deduplication windows keep things sane during storms.

Cron, events, webhooks, and cascade triggers in the same manifest
Idempotent by design — retries never double-process
Deduplication windows so bursty sources never thrash downstreams

alerts/daily_kpis

SLA breached · daily_kpis

2m ago

Runtime 21m exceeded SLA of 12m. Paging on-call via PagerDuty.

severity: pagechannel: #data-oncall

runtime (last 24 runs)SLA: 12m

// sla.monitoring

When the pipeline slips, you know before your dashboard does.

Declare a freshness SLA ("daily_kpis must be < 30m stale") and a duration SLA ("must finish under 12m p95") in the manifest. DXData watches every run against both — and pages you through the channel you care about.

Alerts route to Slack, PagerDuty, Opsgenie, or any webhook, with severity rules that escalate before the on-call gets paged twice.

Freshness and runtime SLAs declared per pipeline — no separate monitoring stack
Integrations: Slack, PagerDuty, Opsgenie, Microsoft Teams, generic webhook
Severity routing with escalation and auto-silence during backfills

// testing.built_in

Tests live with the pipeline, not in a separate repo.

Assertions like expect_not_null, expect_unique, and expect_values_in_set are declared beside the transforms they validate. They run on every pull request, on every scheduled run, and on every backfill — and they can block a merge.

Pair tests with the Git-like Branching capability and you get a true preview environment: open a branch, run your pipeline against a zero-copy snapshot, let tests gate the merge, then publish.

Column, row, and pipeline-level assertions in the same YAML
Fail-fast gates that block merges and deploys automatically
Native CI/CD hooks for GitHub Actions, GitLab CI, and webhooks
Every test run linked to a commit, a branch, and a dataset snapshot

branch-based testingfail-closed by default

tests/daily_kpis.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# pipelines/daily_kpis.yaml
tests:
  - column: day
    expect_not_null: true
  - column: order_id
    expect_unique: true
  - column: status
    expect_values_in_set:
      - 'paid'
      - 'refunded'
      - 'pending'
 
on_failure:
  action: block_merge
  notify: ["#data-quality"]
4 tests · 2 pipelines covered
CI: required

backfill: 30 days of daily_kpis

5× fasterpartition-awareidempotent per-day

// backfill.parallel

Backfill a month in the time it takes to run a day.

One command replays any window — a day, a month, a year — without manually chunking partitions. The scheduler paints partitions onto available workers so a 30-day backfill finishes in the time it takes to do one day on naive tools.

Point-in-time replays are first-class: branch the catalog to a historical commit, run the pipeline against it, inspect the result, then merge or drop.

Intelligent parallelism — the scheduler decides the wave size, you pick the range
Partition-aware so nothing gets double-written
Point-in-time replay using the versioned catalog

// anatomy

A pipeline, labelled.

Ingest, clean, aggregate, publish — with quality gates between every stage and a branch point where preview pipelines diverge from production.

// sources and sinks

Read from anywhere. Write anywhere.

Every connector is bidirectional: pull raw data in or publish materialized outputs out. Pipelines wire them up without glue scripts.

Warehouses

Snowflake

BigQuery

Redshift

Databricks

Streams

Kafka

Kinesis

Pulsar

Pub/Sub

Databases

Postgres

MySQL

MongoDB

SQL Server

Files

GCS

Azure Blob

Iceberg

SaaS

Salesforce

HubSpot

Stripe

Segment

View all 100+ connectors

// replace your stack

One pipeline runtime, no duct tape.

DXData Pipelines folds orchestration, SQL transforms, and scheduling into one declarative layer — backed by the same catalog and RBAC you already use.

vs Airflow

Replaces Airflow

Native understanding of tables, partitions, and watermarks — no XComs
Built-in quality tests and SLA alerts without DIY sensors
Declarative YAML that renders a diff in every pull request

vs dbt

Replaces dbt + runner

Ingest, transform, and publish in one manifest — not just SQL models
Event triggers and streaming modes, not only cron
Zero-copy branch-based testing against the real catalog

vs Prefect

Replaces Prefect

No Python boilerplate required — declarative first, code optional
Partition-aware backfills that parallelize automatically
Governance, RBAC, and lineage inherited from the platform

// use cases

What teams actually ship.

Marketing ELT

Unify HubSpot, Stripe, and GA4 into one attribution table. Incremental hourly, backfillable for any window.

pipelines/marketing_elt.yaml

1
2
3
4
5
6
7
8
# marketing_elt.yaml
sources: [hubspot, stripe, ga4]
steps:
  - unify_customer_id
  - attribute_sessions
  - rollup_to_accounts
sink: iceberg.marketing.attribution
schedule: hourly

Real-time fraud features

Stream authorization events into a feature store with 30-second freshness SLA. Auto-pages when the pipeline lags.

pipelines/fraud_features.yaml

1
2
3
4
5
6
7
8
# fraud_features.yaml
trigger: kafka:txn.auth
mode: streaming
features:
  - txn_velocity_5m
  - geo_hop_distance
sla: { freshness: 30s }
sink: online_store.fraud

Data quality gating

Run assertions in a preview branch, block publishing on failure, notify Slack. Bad data never reaches prod.

pipelines/quality_gating.yaml

1
2
3
4
5
6
7
8
# quality_gating.yaml
stage: preview
tests:
  - expect_not_null: id
  - expect_unique: order_id
on_failure:
  action: block_publish
  notify: slack:#data

// faq

Questions teams ask before they migrate.

Can I migrate my Airflow DAGs?

Yes. Our migration assistant parses an Airflow DAG file, maps Python operators onto native DXData steps, and produces an equivalent YAML manifest. Most DAGs convert cleanly — custom Python operators become inline tasks until you choose to rewrite them declaratively.

How do transformations handle schema evolution?

Iceberg tables absorb additive changes automatically (new columns, widened types), and the pipeline compiler rejects breaking changes in CI with a diff. For renames and drops, you declare the migration in the manifest — the platform writes a compatible view and backfills in the background.

What about dbt projects?

Point DXData at your dbt project directory and we import models, tests, and sources as pipeline steps. You keep your SQL intact; you gain ingestion, triggers, backfills, and SLA alerts without standing up a separate orchestrator.

How do I test pipelines?

Declare assertions beside the transforms they validate. On every commit, DXData runs the pipeline against a zero-copy catalog branch and blocks the merge if a test fails. You can also run ad-hoc test suites from the CLI or your CI workflow.

From plan to production

Ship your next pipeline in minutes.

Start with one YAML file. Add triggers, backfills, and SLA alerts as you grow. Nothing to self-host, nothing to glue together.

Start free trial Book a demo

Pipeline orchestration

Your pipelines. Built, tested, and deployed from one place.

Declarative YAML or a visual DAG editor. Incremental materialization, event triggers, backfills, and SLA alerts — all first-class.

Start free See example pipeline

dag: orders_daily.yaml

schedule: every 1hlast run: 2m agoSLA: 12m p95retries: 2

// the problem

Airflow wasn't built for data, dbt stops at SQL, Prefect wants you to write Python.

Declarative YAML

One manifest describes sources, transforms, schedule, tests, and SLA. Reviewable in a pull request.

Visual DAG editor

Drag-and-drop graph that round-trips to the same YAML. Product folks sketch, engineers review the diff.

Incremental by default

Watermarks are tracked for you. Only new partitions are recomputed — backfills stay cheap.

pipelines/daily_kpis.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# pipelines/daily_kpis.yaml
name: daily_kpis
owner: data-platform
schedule: '0 * * * *'  # hourly
retries:
  max: 3
  backoff: exponential
sla:
  freshness: 30m
  runtime_p95: 12m
steps:
  - name: ingest
    source: postgres.orders
    mode: incremental
    watermark: updated_at
  - name: aggregate
    sql: ./daily_kpis.sql
    materialize: merge
  - name: publish
    sink: bi.daily_kpis
3 steps · compile: ok
schedule: hourly

// pipelines.as_code

YAML-first. Visual whenever you want it.

Because the manifest is plain data, you get all the usual tooling for free: search, lint, codegen, and AI review. No bespoke DSL, no proprietary bundle.

Pipelines-as-code — diff, review, and revert like any other file
Round-trips with the visual editor — no lossy serialization
Linting and type-checking at compile time, not 3 a.m.

incremental: events -> daily_kpis

source: eventswatermark: 2026-04-18T23:59Z

only new rows processed

target: daily_kpis · merge on (day, metric) · +214 rows

day

metric

value

src_rows

ingested

// materialize.incremental

Only the new rows. Every time.

Choose the materialization mode per step: merge on a key, append new partitions, or occasionally do a full-refresh on a schedule of your choosing. No custom Python required.

Merge, append, or full-refresh modes with per-step overrides
Watermarks persisted transactionally — runs are resumable, not lossy
Partition-aware writes that minimize compaction overhead

triggers/orders_ingest.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# triggers/orders_ingest.yaml
triggers:
  - type: cron
    expr: '*/15 * * * *'
  - type: event
    source: s3
    bucket: raw-orders
    match: 'orders/*.parquet'
  - type: webhook
    auth: hmac
cascade:
  - clean_orders
  - fraud_features
  - bi_refresh

trigger: s3.put orders/*

1 event · 3 cascades · paralleldedupe window: 5m

// triggers.cron_event_webhook

Run on a clock, an event, or a handshake.

Cascade triggers let one upstream fan out to dozens of consumers without anyone wiring up sensors. Deduplication windows keep things sane during storms.

Cron, events, webhooks, and cascade triggers in the same manifest
Idempotent by design — retries never double-process
Deduplication windows so bursty sources never thrash downstreams

alerts/daily_kpis

SLA breached · daily_kpis

2m ago

Runtime 21m exceeded SLA of 12m. Paging on-call via PagerDuty.

severity: pagechannel: #data-oncall

runtime (last 24 runs)SLA: 12m

// sla.monitoring

When the pipeline slips, you know before your dashboard does.

Alerts route to Slack, PagerDuty, Opsgenie, or any webhook, with severity rules that escalate before the on-call gets paged twice.

Freshness and runtime SLAs declared per pipeline — no separate monitoring stack
Integrations: Slack, PagerDuty, Opsgenie, Microsoft Teams, generic webhook
Severity routing with escalation and auto-silence during backfills

// testing.built_in

Tests live with the pipeline, not in a separate repo.

Pair tests with the Git-like Branching capability and you get a true preview environment: open a branch, run your pipeline against a zero-copy snapshot, let tests gate the merge, then publish.

Column, row, and pipeline-level assertions in the same YAML
Fail-fast gates that block merges and deploys automatically
Native CI/CD hooks for GitHub Actions, GitLab CI, and webhooks
Every test run linked to a commit, a branch, and a dataset snapshot

branch-based testingfail-closed by default

tests/daily_kpis.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# pipelines/daily_kpis.yaml
tests:
  - column: day
    expect_not_null: true
  - column: order_id
    expect_unique: true
  - column: status
    expect_values_in_set:
      - 'paid'
      - 'refunded'
      - 'pending'
 
on_failure:
  action: block_merge
  notify: ["#data-quality"]
4 tests · 2 pipelines covered
CI: required

backfill: 30 days of daily_kpis

5× fasterpartition-awareidempotent per-day

// backfill.parallel

Backfill a month in the time it takes to run a day.

Point-in-time replays are first-class: branch the catalog to a historical commit, run the pipeline against it, inspect the result, then merge or drop.

Intelligent parallelism — the scheduler decides the wave size, you pick the range
Partition-aware so nothing gets double-written
Point-in-time replay using the versioned catalog

// anatomy

A pipeline, labelled.

Ingest, clean, aggregate, publish — with quality gates between every stage and a branch point where preview pipelines diverge from production.

// sources and sinks

Read from anywhere. Write anywhere.

Every connector is bidirectional: pull raw data in or publish materialized outputs out. Pipelines wire them up without glue scripts.

Warehouses

Snowflake

BigQuery

Redshift

Databricks

Streams

Kafka

Kinesis

Pulsar

Pub/Sub

Databases

Postgres

MySQL

MongoDB

SQL Server

Files

GCS

Azure Blob

Iceberg

SaaS

Salesforce

HubSpot

Stripe

Segment

View all 100+ connectors

// replace your stack

One pipeline runtime, no duct tape.

DXData Pipelines folds orchestration, SQL transforms, and scheduling into one declarative layer — backed by the same catalog and RBAC you already use.

vs Airflow

Replaces Airflow

Native understanding of tables, partitions, and watermarks — no XComs
Built-in quality tests and SLA alerts without DIY sensors
Declarative YAML that renders a diff in every pull request

vs dbt

Replaces dbt + runner

Ingest, transform, and publish in one manifest — not just SQL models
Event triggers and streaming modes, not only cron
Zero-copy branch-based testing against the real catalog

vs Prefect

Replaces Prefect

No Python boilerplate required — declarative first, code optional
Partition-aware backfills that parallelize automatically
Governance, RBAC, and lineage inherited from the platform

// use cases

What teams actually ship.

Marketing ELT

Unify HubSpot, Stripe, and GA4 into one attribution table. Incremental hourly, backfillable for any window.

pipelines/marketing_elt.yaml

1
2
3
4
5
6
7
8
# marketing_elt.yaml
sources: [hubspot, stripe, ga4]
steps:
  - unify_customer_id
  - attribute_sessions
  - rollup_to_accounts
sink: iceberg.marketing.attribution
schedule: hourly

Real-time fraud features

Stream authorization events into a feature store with 30-second freshness SLA. Auto-pages when the pipeline lags.

pipelines/fraud_features.yaml

1
2
3
4
5
6
7
8
# fraud_features.yaml
trigger: kafka:txn.auth
mode: streaming
features:
  - txn_velocity_5m
  - geo_hop_distance
sla: { freshness: 30s }
sink: online_store.fraud

Data quality gating

Run assertions in a preview branch, block publishing on failure, notify Slack. Bad data never reaches prod.

pipelines/quality_gating.yaml

1
2
3
4
5
6
7
8
# quality_gating.yaml
stage: preview
tests:
  - expect_not_null: id
  - expect_unique: order_id
on_failure:
  action: block_publish
  notify: slack:#data

// faq

Questions teams ask before they migrate.

Can I migrate my Airflow DAGs?

How do transformations handle schema evolution?

What about dbt projects?

How do I test pipelines?

From plan to production

Ship your next pipeline in minutes.

Start with one YAML file. Add triggers, backfills, and SLA alerts as you grow. Nothing to self-host, nothing to glue together.

Start free trial Book a demo

Your pipelines. Built, tested, and deployed from one place.

Airflow wasn't built for data, dbt stops at SQL, Prefect wants you to write Python.

Declarative YAML

Visual DAG editor

Incremental by default

YAML-first. Visual whenever you want it.

Only the new rows. Every time.

Run on a clock, an event, or a handshake.

SLA breached · daily_kpis

When the pipeline slips, you know before your dashboard does.

Tests live with the pipeline, not in a separate repo.

Backfill a month in the time it takes to run a day.

A pipeline, labelled.

Read from anywhere. Write anywhere.

Warehouses

Streams

Databases

Files

SaaS

One pipeline runtime, no duct tape.

Replaces Airflow

Replaces dbt + runner

Replaces Prefect

What teams actually ship.

Marketing ELT

Real-time fraud features

Data quality gating

Questions teams ask before they migrate.

Keep exploring the platform.

Query Engine

Observability

Git-like Branching

Ship your next pipeline in minutes.

Your pipelines. Built, tested, and deployed from one place.

Airflow wasn't built for data, dbt stops at SQL, Prefect wants you to write Python.

Declarative YAML

Visual DAG editor

Incremental by default

YAML-first. Visual whenever you want it.

Only the new rows. Every time.

Run on a clock, an event, or a handshake.

SLA breached · daily_kpis

When the pipeline slips, you know before your dashboard does.

Tests live with the pipeline, not in a separate repo.

Backfill a month in the time it takes to run a day.

A pipeline, labelled.

Read from anywhere. Write anywhere.

Warehouses

Streams

Databases

Files

SaaS

One pipeline runtime, no duct tape.

Replaces Airflow

Replaces dbt + runner

Replaces Prefect

What teams actually ship.

Marketing ELT

Real-time fraud features

Data quality gating

Questions teams ask before they migrate.

Keep exploring the platform.

Query Engine

Observability

Git-like Branching

Ship your next pipeline in minutes.