Git-like branching

Branch your data. Ship with confidence.

Write-audit-publish workflows powered by Nessie. Every table, every schema, every dataset — versioned, reversible, and zero-copy.

Start free Watch the 2-min demo

catalog.nessie — recent activity14 commits

mainfeature/new-metrichotfix

// the old way

Shipping a schema change used to mean a maintenance window. Not anymore.

The traditional playbook for changing a production table reads like a deploy from 2012: stop the pipeline, copy the table, run the migration, backfill, swap names, pray. Teams burn weekends on it — and that is before anyone asks what happens if last-minute bug in the transform corrupts a million rows.

DXData inherits the answer from software engineering. Branch the data, do the risky work in isolation, run your tests, open a PR, merge on green. When something goes wrong — and it will — roll back to a tag in seconds instead of restoring from last night's backup. Your pipelines never stop. Your dashboards never blink.

Branch any table

Fork a single table, a schema, or your whole catalog. Every branch has its own working history.

Zero-copy

Branches share underlying Iceberg snapshots until you write. No storage tax for keeping six branches around.

Reversible

Every commit is a pointer. Roll back a bad migration in seconds — no restore job required.

~/dxdata · create-branch

# fork main into an isolated working branch
$ dxdata branch create feature/new-metric --from main
  created feature/new-metric at snapshot 8a4f1e9
 
# list branches
$ dxdata branch list
  * main               8a4f1e9  2h ago
    feature/new-metric  8a4f1e9  just now
    experiment/churn    6d2ac71  3d ago

// branch.create

Create a branch in seconds.

Nessie gives DXData the same primitives Git gave code: named refs, tags, immutable commits, atomic merges. A branch is a pointer — not a copy, not an ETL job, not a new schema you have to garbage-collect next quarter.

Your analysts branch the catalog the way they branch a repo. Your platform team enforces policy on the merge, the same way a GitHub Actions workflow gates a release.

Fork a single table or the whole catalog
Works with every source DXData ingests
Audit log on every branch, tag, and merge

nessie://catalog/orders

refmainsnapshot 8a4f..9

reffeature/new-metricsnapshot 8a4f..9

part-0001.parquet

part-0002.parquet

part-0003.parquet

part-0004.parquet

both refs point to the same immutable files — 0 bytes copied

// zero-copy.isolation

Isolated branches that cost nothing.

Under the hood, each branch is a Nessie reference that points at the same Iceberg snapshot as its parent. Until you actually write, both refs resolve to the exact same parquet files on object storage — 0 bytes of new data, 0 scheduler pressure, 0 copy jobs.

The moment you write, only the changed snapshot metadata and any rewritten files live on the branch. That is what makes it safe to leave a dozen experimental branches running overnight without paging your FinOps team.

Iceberg snapshots are immutable — branches are just new pointers
Copy-on-write at the file level, not the table level
Ephemeral branches cost effectively zero

pr-218 · write-audit-publish

Writebranch create

Audittests + review

Publishmerge to main

ci · data qualityall checks passed · 2.5s

unique_key on orders.id1.2s
not_null on orders.total0.4s
row count within +/- 2%0.8s
schema contract: v4.20.1s

// wap.one-command

Write. Audit. Publish.

Write-Audit-Publish is the industry-standard pattern for safely landing data changes — write to a side table, audit it, then swap it in. Great idea, historically a mountain of YAML to implement.

DXData collapses it into one command. The branch is your side table. The quality checks are your audit. Fast-forwarding main is your publish. Green means ship, red means the branch stays put and nothing in production moves.

dbt tests, great-expectations checks, and native quality gates — all first-class
Policy hooks can block a merge on compliance, PII, or lineage rules
Every merge writes a signed commit to the audit log

pr/218 · catalog.orders — schema diff

Add ltv_score to orders, normalize timestampsfeature/new-metric → main · opened by @kira

review requested

+ column   ltv_score        DECIMAL(10,4)new

~ column   created_at       TIMESTAMP → TIMESTAMPTZtype

→ column   cust_id          renamed to customer_idrename

- column   legacy_flag      BOOLEANremoved

Row count1,284,992 rows (+0.4%)
Null rate on customer_id0.00%
Downstream dashboards2 use legacy_flag

// review.like-code

Review data changes the way you review code.

When someone opens a branch for review, DXData renders a diff your team actually understands: columns added, types changed, rows added or deleted, with a side-by-side of quality metrics before and after.

Approvers comment on specific rows or columns, block merges, or request changes — all in the same place. No Slack threads, no screenshots, no guessing which version of the dashboard is current.

Column-level diff with rename detection
Quality checks and lineage impact surfaced inline
Approve, request changes, or require a second reviewer — per-branch policy

~/dxdata · rollback

# tag the last known-good state
$ dxdata tag prod-2026-04-19-090000 --branch main
  tag created
 
# bad migration shipped — roll back
$ dxdata reset --to prod-2026-04-19-090000
  rollback complete · 820ms · 0 bytes moved

catalog/tags · rollback timeline

prod-2026-04-19-090000live
today · 09:00 · last known good
prod-2026-04-18-090000
yesterday · 09:00 · pre-migration
prod-2026-04-17-090000
2 days ago · baseline
prod-2026-04-16-090000
3 days ago · baseline

// rollback.tagged

Rollbacks are a first-class operation.

Tags pin a commit so you can always get back to a known-good state. Roll a nightly tag like prod-2026-04-19-090000 before every production deploy, and bad migrations become a one-command undo.

Because rollbacks only rewrite the catalog reference — not the underlying files — they are instant, atomic, and safe to run from the on-call laptop at 2am.

Automated nightly tags or tag on every merge
Rollbacks complete in under a second at any table size
Historical tags remain queryable forever — great for audit

time_travel.sql

1
2
3
4
5
6
7
-- query the catalog at a specific commit
SELECT * FROM orders
FOR VERSION AS OF '8a4f1e9';
 
-- or query by wall-clock time
SELECT * FROM orders
FOR TIMESTAMP AS OF '2026-04-18 10:00:00';

// time-travel

Query any point in history.

Every table in DXData is time-travelable out of the box. Query a past commit for debugging, reconcile yesterday's report against today, or reproduce a bug from last Tuesday without restoring a backup.

The syntax is standard SQL — the DXData query engine resolves the historical snapshot and plans the query against it, so your existing tooling just works.

FOR VERSION AS OF <commit> and FOR TIMESTAMP AS OF <ts>
Works across branches and tags — including historical rollbacks
Zero performance penalty for recent snapshots

.dxdata/ci.yaml

# .dxdata/ci.yaml — ephemeral branch per pipeline run
pipeline: orders_daily
on: schedule(hourly)
 
branch:
  strategy: ephemeral
  name: "ci/orders_daily-${RUN_ID}"
  from: main
 
audit:
  - run: "dbt test --select tag:critical"
  - run: "dxdata quality check orders"
 
publish:
  on_success: merge_to_main
  on_failure: discard_branch

// ci.ephemeral

Ephemeral branches for every pipeline run.

Point your CI at a branch, not at main. Each pipeline run forks a fresh branch, writes to it, and runs tests — all in full isolation from what your dashboards are reading.

If every check goes green, DXData merges the branch forward. If anything fails, the branch is discarded and your production catalog never sees the bad write. It is the same testing pattern every engineering team uses for code, applied to data.

Per-run branches keep concurrent pipelines from stepping on each other
Failures roll back for free — the branch just disappears
Works with GitHub Actions, GitLab CI, or any shell-driven runner

// branching visualized

Your catalog, four weeks of history, one picture.

Four active branches. Fifteen commits. Two tagged releases. No storage bloat, no blocked merges, no sweat.

// in practice

Three shapes of problem, one pattern.

Safe schema migrations

Run a full backfill on a branch, prove the shape with production queries, then fast-forward main. No maintenance window. No frozen dashboards.

A/B test dataset variants

Spin up `variant/churn-model-v2` off main. Route half the inference traffic to it. Keep the winner, delete the loser — all branches, no copies.

Reproduce a customer report

A support ticket cites numbers from last Tuesday. Create a branch from that tag, run the exact report, diff against today. Case closed in five minutes.

// faq

Questions, answered.

Not until you modify data. A fresh branch is a 28-byte Nessie reference that points at the same Iceberg snapshot as main. Only the diff — new snapshot metadata and any rewritten parquet files — lives uniquely on the branch.

// related capabilities

// pipelines.nativePipelinesDeclarative YAML or visual DAGs that write straight to branches, run on schedule, and publish on green.

// catalog.versionedData CatalogEvery table, column, and lineage edge — automatically documented, automatically versioned.

// query.engineQuery EngineTrino-powered federated SQL with native branch and time-travel syntax across every source you own.

Treat your data like your code

Branches, reviews, rollbacks. On your data.

Get started in under five minutes. No credit card, no data migration — point DXData at your lakehouse and start branching.

Start free trial Book a demo

Git-like branching

Branch your data. Ship with confidence.

Write-audit-publish workflows powered by Nessie. Every table, every schema, every dataset — versioned, reversible, and zero-copy.

Start free Watch the 2-min demo

catalog.nessie — recent activity14 commits

mainfeature/new-metrichotfix

// the old way

Shipping a schema change used to mean a maintenance window. Not anymore.

Branch any table

Fork a single table, a schema, or your whole catalog. Every branch has its own working history.

Zero-copy

Branches share underlying Iceberg snapshots until you write. No storage tax for keeping six branches around.

Reversible

Every commit is a pointer. Roll back a bad migration in seconds — no restore job required.

~/dxdata · create-branch

# fork main into an isolated working branch
$ dxdata branch create feature/new-metric --from main
  created feature/new-metric at snapshot 8a4f1e9
 
# list branches
$ dxdata branch list
  * main               8a4f1e9  2h ago
    feature/new-metric  8a4f1e9  just now
    experiment/churn    6d2ac71  3d ago

// branch.create

Create a branch in seconds.

Your analysts branch the catalog the way they branch a repo. Your platform team enforces policy on the merge, the same way a GitHub Actions workflow gates a release.

Fork a single table or the whole catalog
Works with every source DXData ingests
Audit log on every branch, tag, and merge

nessie://catalog/orders

refmainsnapshot 8a4f..9

reffeature/new-metricsnapshot 8a4f..9

part-0001.parquet

part-0002.parquet

part-0003.parquet

part-0004.parquet

both refs point to the same immutable files — 0 bytes copied

// zero-copy.isolation

Isolated branches that cost nothing.

Iceberg snapshots are immutable — branches are just new pointers
Copy-on-write at the file level, not the table level
Ephemeral branches cost effectively zero

pr-218 · write-audit-publish

Writebranch create

Audittests + review

Publishmerge to main

ci · data qualityall checks passed · 2.5s

unique_key on orders.id1.2s
not_null on orders.total0.4s
row count within +/- 2%0.8s
schema contract: v4.20.1s

// wap.one-command

Write. Audit. Publish.

Write-Audit-Publish is the industry-standard pattern for safely landing data changes — write to a side table, audit it, then swap it in. Great idea, historically a mountain of YAML to implement.

dbt tests, great-expectations checks, and native quality gates — all first-class
Policy hooks can block a merge on compliance, PII, or lineage rules
Every merge writes a signed commit to the audit log

pr/218 · catalog.orders — schema diff

Add ltv_score to orders, normalize timestampsfeature/new-metric → main · opened by @kira

review requested

+ column   ltv_score        DECIMAL(10,4)new

~ column   created_at       TIMESTAMP → TIMESTAMPTZtype

→ column   cust_id          renamed to customer_idrename

- column   legacy_flag      BOOLEANremoved

Row count1,284,992 rows (+0.4%)
Null rate on customer_id0.00%
Downstream dashboards2 use legacy_flag

// review.like-code

Review data changes the way you review code.

Approvers comment on specific rows or columns, block merges, or request changes — all in the same place. No Slack threads, no screenshots, no guessing which version of the dashboard is current.

Column-level diff with rename detection
Quality checks and lineage impact surfaced inline
Approve, request changes, or require a second reviewer — per-branch policy

~/dxdata · rollback

# tag the last known-good state
$ dxdata tag prod-2026-04-19-090000 --branch main
  tag created
 
# bad migration shipped — roll back
$ dxdata reset --to prod-2026-04-19-090000
  rollback complete · 820ms · 0 bytes moved

catalog/tags · rollback timeline

prod-2026-04-19-090000live
today · 09:00 · last known good
prod-2026-04-18-090000
yesterday · 09:00 · pre-migration
prod-2026-04-17-090000
2 days ago · baseline
prod-2026-04-16-090000
3 days ago · baseline

// rollback.tagged

Rollbacks are a first-class operation.

Tags pin a commit so you can always get back to a known-good state. Roll a nightly tag like prod-2026-04-19-090000 before every production deploy, and bad migrations become a one-command undo.

Because rollbacks only rewrite the catalog reference — not the underlying files — they are instant, atomic, and safe to run from the on-call laptop at 2am.

Automated nightly tags or tag on every merge
Rollbacks complete in under a second at any table size
Historical tags remain queryable forever — great for audit

time_travel.sql

1
2
3
4
5
6
7
-- query the catalog at a specific commit
SELECT * FROM orders
FOR VERSION AS OF '8a4f1e9';
 
-- or query by wall-clock time
SELECT * FROM orders
FOR TIMESTAMP AS OF '2026-04-18 10:00:00';

// time-travel

Query any point in history.

The syntax is standard SQL — the DXData query engine resolves the historical snapshot and plans the query against it, so your existing tooling just works.

FOR VERSION AS OF <commit> and FOR TIMESTAMP AS OF <ts>
Works across branches and tags — including historical rollbacks
Zero performance penalty for recent snapshots

.dxdata/ci.yaml

# .dxdata/ci.yaml — ephemeral branch per pipeline run
pipeline: orders_daily
on: schedule(hourly)
 
branch:
  strategy: ephemeral
  name: "ci/orders_daily-${RUN_ID}"
  from: main
 
audit:
  - run: "dbt test --select tag:critical"
  - run: "dxdata quality check orders"
 
publish:
  on_success: merge_to_main
  on_failure: discard_branch

// ci.ephemeral

Ephemeral branches for every pipeline run.

Point your CI at a branch, not at main. Each pipeline run forks a fresh branch, writes to it, and runs tests — all in full isolation from what your dashboards are reading.

Per-run branches keep concurrent pipelines from stepping on each other
Failures roll back for free — the branch just disappears
Works with GitHub Actions, GitLab CI, or any shell-driven runner

// branching visualized

Your catalog, four weeks of history, one picture.

Four active branches. Fifteen commits. Two tagged releases. No storage bloat, no blocked merges, no sweat.

// in practice

Three shapes of problem, one pattern.

Safe schema migrations

Run a full backfill on a branch, prove the shape with production queries, then fast-forward main. No maintenance window. No frozen dashboards.

A/B test dataset variants

Spin up `variant/churn-model-v2` off main. Route half the inference traffic to it. Keep the winner, delete the loser — all branches, no copies.

Reproduce a customer report

A support ticket cites numbers from last Tuesday. Create a branch from that tag, run the exact report, diff against today. Case closed in five minutes.

// faq

Questions, answered.

Not until you modify data. A fresh branch is a 28-byte Nessie reference that points at the same Iceberg snapshot as main. Only the diff — new snapshot metadata and any rewritten parquet files — lives uniquely on the branch.

// related capabilities

// pipelines.nativePipelinesDeclarative YAML or visual DAGs that write straight to branches, run on schedule, and publish on green.

// catalog.versionedData CatalogEvery table, column, and lineage edge — automatically documented, automatically versioned.

// query.engineQuery EngineTrino-powered federated SQL with native branch and time-travel syntax across every source you own.

Treat your data like your code

Branches, reviews, rollbacks. On your data.

Get started in under five minutes. No credit card, no data migration — point DXData at your lakehouse and start branching.

Start free trial Book a demo

Branch your data. Ship with confidence.

Shipping a schema change used to mean a maintenance window. Not anymore.

Why data branching

Branch any table

Zero-copy

Reversible

Create a branch in seconds.

Isolated branches that cost nothing.

Write. Audit. Publish.

Review data changes the way you review code.

Rollbacks are a first-class operation.

Query any point in history.

Ephemeral branches for every pipeline run.

Your catalog, four weeks of history, one picture.

Three shapes of problem, one pattern.

Safe schema migrations

A/B test dataset variants

Reproduce a customer report

Questions, answered.

Branching gets better with the rest.

Branches, reviews, rollbacks. On your data.

Branch your data. Ship with confidence.

Shipping a schema change used to mean a maintenance window. Not anymore.

Why data branching

Branch any table

Zero-copy

Reversible

Create a branch in seconds.

Isolated branches that cost nothing.

Write. Audit. Publish.

Review data changes the way you review code.

Rollbacks are a first-class operation.

Query any point in history.

Ephemeral branches for every pipeline run.

Your catalog, four weeks of history, one picture.

Three shapes of problem, one pattern.

Safe schema migrations

A/B test dataset variants

Reproduce a customer report

Questions, answered.

Branching gets better with the rest.

Branches, reviews, rollbacks. On your data.