EngineeringJanuary 7, 202610 min read

Query federation: what finally made it fast

Federated queries across Iceberg, Postgres, and object stores should not be slow. We rebuilt our planner around three ideas that finally made the fast path consistently fast.

KL

Kai LindstromEngineering, Pipelines

Share

Federated query is one of those features that demos beautifully and then disappoints in production. We have been through two major planner rewrites to fix that. Here is what stuck.

Push down aggressively, then verify

Our old planner tried to be clever about which predicates were safe to push down into a remote source. Our new planner pushes everything and then verifies the result against a locally evaluated sample. If the two disagree, we fall back to the unpushed plan. The verification step is cheap and catches the dialect edge cases that used to be unfixable in the planner.

One coordinator per source

Running a single coordinator across all sources made for nice topology diagrams and terrible tail latencies. We now spin up a per-source coordinator with its own connection pool and its own adaptive timeout.

Caching is not optional

We cache the result of every remote scan for up to 60 seconds, keyed on (source, filter, columns). In steady-state BI workloads that simple rule moves cache hit rate above 80% and takes most of the wall-clock cost out of the federated path.

KL

Written by

Kai Lindstrom

Engineering, Pipelines at DXData.

Start building with DXData.

Spin up a catalog in minutes. Bring your own object store, keep your existing SQL, and branch your data like code.

Get started free Read the docs

Query federation: what finally made it fast

Push down aggressively, then verify

One coordinator per source

Caching is not optional

Read next

How we run 4.2B rows/day through a single Iceberg catalog

Branching production tables: a 6-month postmortem

Why we built Nessie-style branching into DXData

Start building with DXData.

Query federation: what finally made it fast

Push down aggressively, then verify

One coordinator per source

Caching is not optional

Read next

How we run 4.2B rows/day through a single Iceberg catalog

Branching production tables: a 6-month postmortem

Why we built Nessie-style branching into DXData

Start building with DXData.