Declarative YAML
One manifest describes sources, transforms, schedule, tests, and SLA. Reviewable in a pull request.
Loading DXData
Pipeline orchestration
Declarative YAML or a visual DAG editor. Incremental materialization, event triggers, backfills, and SLA alerts — all first-class.
// the problem
Most pipeline tools force a tradeoff. Orchestrators treat every task like an opaque shell command — they don't know a schema from a socket. SQL-only frameworks skip the ingest, triggers, and backfills that real pipelines need. Python-first runners hand you a blank file and call that flexibility.
DXData starts the other direction: a declarative pipeline format that understands tables, partitions, and time. Visual when you want it, code-first when you need it, and consistent all the way from laptop to production.
One manifest describes sources, transforms, schedule, tests, and SLA. Reviewable in a pull request.
Drag-and-drop graph that round-trips to the same YAML. Product folks sketch, engineers review the diff.
Watermarks are tracked for you. Only new partitions are recomputed — backfills stay cheap.
// pipelines.as_code
Every pipeline is a single declarative manifest. It lives in your repo, renders a diff on every pull request, and is reviewed the same way as any other code. The visual DAG editor writes the same format, so designers and engineers share a source of truth.
Because the manifest is plain data, you get all the usual tooling for free: search, lint, codegen, and AI review. No bespoke DSL, no proprietary bundle.
// materialize.incremental
Every step tracks a watermark on an ordered column (timestamps, log sequence numbers, Iceberg snapshots). When the pipeline runs, it pulls only rows past the last successful watermark — and idempotently upserts them into the target.
Choose the materialization mode per step: merge on a key, append new partitions, or occasionally do a full-refresh on a schedule of your choosing. No custom Python required.
// triggers.cron_event_webhook
Cron is the floor, not the ceiling. Trigger pipelines from S3 object events, HTTP webhooks, Kafka topics, or upstream pipeline completions. Manual runs are first-class — no separate "run now" console.
Cascade triggers let one upstream fan out to dozens of consumers without anyone wiring up sensors. Deduplication windows keep things sane during storms.
Runtime 21m exceeded SLA of 12m. Paging on-call via PagerDuty.
// sla.monitoring
Declare a freshness SLA ("daily_kpis must be < 30m stale") and a duration SLA ("must finish under 12m p95") in the manifest. DXData watches every run against both — and pages you through the channel you care about.
Alerts route to Slack, PagerDuty, Opsgenie, or any webhook, with severity rules that escalate before the on-call gets paged twice.
// testing.built_in
Assertions like expect_not_null, expect_unique, and expect_values_in_set are declared beside the transforms they validate. They run on every pull request, on every scheduled run, and on every backfill — and they can block a merge.
Pair tests with the Git-like Branching capability and you get a true preview environment: open a branch, run your pipeline against a zero-copy snapshot, let tests gate the merge, then publish.
// backfill.parallel
One command replays any window — a day, a month, a year — without manually chunking partitions. The scheduler paints partitions onto available workers so a 30-day backfill finishes in the time it takes to do one day on naive tools.
Point-in-time replays are first-class: branch the catalog to a historical commit, run the pipeline against it, inspect the result, then merge or drop.
// anatomy
Ingest, clean, aggregate, publish — with quality gates between every stage and a branch point where preview pipelines diverge from production.
// sources and sinks
Every connector is bidirectional: pull raw data in or publish materialized outputs out. Pipelines wire them up without glue scripts.
// replace your stack
DXData Pipelines folds orchestration, SQL transforms, and scheduling into one declarative layer — backed by the same catalog and RBAC you already use.
vs Airflow
vs dbt
vs Prefect
// use cases
Unify HubSpot, Stripe, and GA4 into one attribution table. Incremental hourly, backfillable for any window.
Stream authorization events into a feature store with 30-second freshness SLA. Auto-pages when the pipeline lags.
Run assertions in a preview branch, block publishing on failure, notify Slack. Bad data never reaches prod.
// faq
Yes. Our migration assistant parses an Airflow DAG file, maps Python operators onto native DXData steps, and produces an equivalent YAML manifest. Most DAGs convert cleanly — custom Python operators become inline tasks until you choose to rewrite them declaratively.
Iceberg tables absorb additive changes automatically (new columns, widened types), and the pipeline compiler rejects breaking changes in CI with a diff. For renames and drops, you declare the migration in the manifest — the platform writes a compatible view and backfills in the background.
Point DXData at your dbt project directory and we import models, tests, and sources as pipeline steps. You keep your SQL intact; you gain ingestion, triggers, backfills, and SLA alerts without standing up a separate orchestrator.
Declare assertions beside the transforms they validate. On every commit, DXData runs the pipeline against a zero-copy catalog branch and blocks the merge if a test fails. You can also run ad-hoc test suites from the CLI or your CI workflow.
From plan to production
Start with one YAML file. Add triggers, backfills, and SLA alerts as you grow. Nothing to self-host, nothing to glue together.