EngineeringMarch 22, 20269 min read

Branching production tables: a 6-month postmortem

We gave every engineer a git-style branch on their production tables. Here is everything that broke, what we shipped to fix it, and what we would not build again.

RT

Ren TakahashiEngineering, Catalog

Share

Six months ago we rolled out branchable tables to every customer on the platform. The pitch was simple: treat a data change like a code change. Open a branch, run your backfill, let QA query it, merge it when green.

The pitch was good. The implementation had three sharp edges that took us the full six months to round off.

Sharp edge #1: branch cost was invisible

The first week after launch, a single customer opened 420 branches against a 40 TB table. Most of those branches were abandoned after the query session ended, but the snapshots kept the underlying data files pinned. Storage jumped 3.1x in a weekend.

We now show a projected storage delta the moment a branch is opened, calculated from the manifest of the base snapshot and the user's write plan. Branches with no writes for 7 days get auto-archived with a 72-hour reminder window.

Sharp edge #2: merge conflicts on append-only tables

We shipped with a simple merge strategy: if two branches touch the same partition, the merge blocks and prompts for resolution. Sounds correct. In practice, almost every production table is append-only, and two branches "touching the same partition" just means two people added new rows to the same day.

merge_strategy.sqlsql

-- old behavior: merge blocks
MERGE BRANCH analyst_backfill INTO main
  ON CONFLICT (partition) FAIL;

-- new behavior for append-only tables
MERGE BRANCH analyst_backfill INTO main
  ON CONFLICT (partition) UNION ALL WHEN APPEND_ONLY;

We added an explicit ON CONFLICT (partition) UNION ALL WHEN APPEND_ONLY strategy and made it the default when the catalog can prove both branches are append-only, which catches ~80% of real-world merges.

Sharp edge #3: the UI lied about what a branch was

We called them "branches" because they walk and quack like git branches. They are not git branches. They are catalog-level references to an Iceberg snapshot tree. That distinction matters when someone rebases.

We renamed the action in the UI from "rebase" to "fast-forward to latest", which is what it has always actually done, and the support tickets dropped by half in a week.

What we would not build again

We shipped per-branch quotas in month two and deprecated them in month five. Nobody used them. In retrospect we should have just raised the default limits twice and put storage telemetry in front of the user.

RT

Written by

Ren Takahashi

Engineering, Catalog at DXData.

Start building with DXData.

Spin up a catalog in minutes. Bring your own object store, keep your existing SQL, and branch your data like code.

Get started free Read the docs

Branching production tables: a 6-month postmortem

Sharp edge #1: branch cost was invisible

Sharp edge #2: merge conflicts on append-only tables

Sharp edge #3: the UI lied about what a branch was

What we would not build again

Read next

How we run 4.2B rows/day through a single Iceberg catalog

Query federation: what finally made it fast

Why we built Nessie-style branching into DXData

Start building with DXData.

Branching production tables: a 6-month postmortem

Sharp edge #1: branch cost was invisible

Sharp edge #2: merge conflicts on append-only tables

Sharp edge #3: the UI lied about what a branch was

What we would not build again

Read next

How we run 4.2B rows/day through a single Iceberg catalog

Query federation: what finally made it fast

Why we built Nessie-style branching into DXData

Start building with DXData.