ProductMarch 4, 20267 min read

Why we built Nessie-style branching into DXData

Most catalogs treat branching as an afterthought. We treated it as a load-bearing primitive. Here is the product thesis behind that decision and where the model is heading.

Priya VenkatesanHead of Product

When we started DXData, we made one opinionated call that has shaped almost every product decision since: branching is not a feature, it is the unit of work.

The industry had already converged on Apache Iceberg for the table format. The open question was what sat on top. Nessie — the Project Nessie catalog — had shown that git-style semantics could work over Iceberg. We doubled down on that model, rebuilt the catalog around it, and made it the first-class way to do almost anything.

The thesis

Our belief is that data teams want the same workflow their engineering peers have. Open a PR. Get CI. Merge when green. Revert when wrong.

Every change is a branch. Backfills, schema migrations, pipeline reruns — all of them open a branch against the target catalog.
CI runs on the branch. Data quality tests, row-count checks, and approval gates run against the branch snapshot, not a copy.
Merge is atomic. Either every change in the branch lands or none of it does.

Where this goes next

The next step is making branches first-class in downstream tools — BI, notebooks, orchestrators. We want a Looker query to be able to say "give me this dashboard, but against branch growth-migration-v2" and have the full data stack honor that without a single ETL rerun.

Branching is the thing that lets a data change move through review without blocking the people doing other data changes.

Written by

Priya Venkatesan

Head of Product at DXData.

Start building with DXData.

Spin up a catalog in minutes. Bring your own object store, keep your existing SQL, and branch your data like code.

Get started free Read the docs

All posts

ProductMarch 4, 20267 min read

Why we built Nessie-style branching into DXData

Most catalogs treat branching as an afterthought. We treated it as a load-bearing primitive. Here is the product thesis behind that decision and where the model is heading.

Priya VenkatesanHead of Product

When we started DXData, we made one opinionated call that has shaped almost every product decision since: branching is not a feature, it is the unit of work.

The thesis

Our belief is that data teams want the same workflow their engineering peers have. Open a PR. Get CI. Merge when green. Revert when wrong.

Every change is a branch. Backfills, schema migrations, pipeline reruns — all of them open a branch against the target catalog.
CI runs on the branch. Data quality tests, row-count checks, and approval gates run against the branch snapshot, not a copy.
Merge is atomic. Either every change in the branch lands or none of it does.

Where this goes next

Branching is the thing that lets a data change move through review without blocking the people doing other data changes.

Written by

Priya Venkatesan

Head of Product at DXData.

Start building with DXData.

Spin up a catalog in minutes. Bring your own object store, keep your existing SQL, and branch your data like code.

Get started free Read the docs

Why we built Nessie-style branching into DXData

The thesis

Where this goes next

Read next

From Airflow to event-driven pipelines

How we run 4.2B rows/day through a single Iceberg catalog

Branching production tables: a 6-month postmortem

Start building with DXData.

Why we built Nessie-style branching into DXData

The thesis

Where this goes next

Read next

From Airflow to event-driven pipelines

How we run 4.2B rows/day through a single Iceberg catalog

Branching production tables: a 6-month postmortem

Start building with DXData.