Six months ago we rolled out branchable tables to every customer on the platform. The pitch was simple: treat a data change like a code change. Open a branch, run your backfill, let QA query it, merge it when green.
The pitch was good. The implementation had three sharp edges that took us the full six months to round off.
Sharp edge #1: branch cost was invisible
The first week after launch, a single customer opened 420 branches against a 40 TB table. Most of those branches were abandoned after the query session ended, but the snapshots kept the underlying data files pinned. Storage jumped 3.1x in a weekend.
We now show a projected storage delta the moment a branch is opened, calculated from the manifest of the base snapshot and the user's write plan. Branches with no writes for 7 days get auto-archived with a 72-hour reminder window.
Sharp edge #2: merge conflicts on append-only tables
We shipped with a simple merge strategy: if two branches touch the same partition, the merge blocks and prompts for resolution. Sounds correct. In practice, almost every production table is append-only, and two branches "touching the same partition" just means two people added new rows to the same day.
-- old behavior: merge blocks
MERGE BRANCH analyst_backfill INTO main
ON CONFLICT (partition) FAIL;
-- new behavior for append-only tables
MERGE BRANCH analyst_backfill INTO main
ON CONFLICT (partition) UNION ALL WHEN APPEND_ONLY;We added an explicit ON CONFLICT (partition) UNION ALL WHEN APPEND_ONLY strategy and made it the default when the catalog can prove both branches are append-only, which catches ~80% of real-world merges.
Sharp edge #3: the UI lied about what a branch was
We called them "branches" because they walk and quack like git branches. They are not git branches. They are catalog-level references to an Iceberg snapshot tree. That distinction matters when someone rebases.
We renamed the action in the UI from "rebase" to "fast-forward to latest", which is what it has always actually done, and the support tickets dropped by half in a week.
What we would not build again
We shipped per-branch quotas in month two and deprecated them in month five. Nobody used them. In retrospect we should have just raised the default limits twice and put storage telemetry in front of the user.
Written by
Ren Takahashi
Engineering, Catalog at DXData.