When we started DXData, we made one opinionated call that has shaped almost every product decision since: branching is not a feature, it is the unit of work.
The industry had already converged on Apache Iceberg for the table format. The open question was what sat on top. Nessie — the Project Nessie catalog — had shown that git-style semantics could work over Iceberg. We doubled down on that model, rebuilt the catalog around it, and made it the first-class way to do almost anything.
The thesis
Our belief is that data teams want the same workflow their engineering peers have. Open a PR. Get CI. Merge when green. Revert when wrong.
- Every change is a branch. Backfills, schema migrations, pipeline reruns — all of them open a branch against the target catalog.
- CI runs on the branch. Data quality tests, row-count checks, and approval gates run against the branch snapshot, not a copy.
- Merge is atomic. Either every change in the branch lands or none of it does.
Where this goes next
The next step is making branches first-class in downstream tools — BI, notebooks, orchestrators. We want a Looker query to be able to say "give me this dashboard, but against branch growth-migration-v2" and have the full data stack honor that without a single ETL rerun.
Branching is the thing that lets a data change move through review without blocking the people doing other data changes.
Written by
Priya Venkatesan
Head of Product at DXData.