Query Engine

Interactive SQL, everywhere your data lives.

Run ANSI SQL across Iceberg, Postgres, Snowflake, S3, and 100+ sources without moving a byte. Sub-second cache hits. Full time-travel.

Start free Read the docs

query_workspace.sql

1

Federated in a single SELECT

Join an Iceberg fact table, an OLTP dimension, and an S3 event stream without ETL. One query, one result set.

Sub-second result cache

Adaptive result, partition, and query-plan caches turn repeat dashboards into 23-millisecond reads.

Time-travel to any snapshot

Query Iceberg snapshots and Nessie branches by timestamp or commit hash — auditable, reproducible, reversible.

federation_demo.sql

1
2
3
4
5
6
7
8
9
10
-- joining Iceberg, Postgres, and S3 in one plan
SELECT o.region,
       c.tier,
       COUNT(DISTINCT e.session_id) AS sessions,
       SUM(o.total_cents) AS revenue
FROM iceberg.sales.orders o
JOIN postgres.crm.customers c USING (customer_id)
JOIN s3.events.web_sessions e USING (customer_id)
WHERE o.created_at > '2026-01-01'
GROUP BY o.region, c.tier;
12 rows · 284ms · 3 sources federated
pushdown: 100%

One query, three engines, one result set.

// federation

One planner across every source you own.

The query planner inspects every catalog in the FROM clause, pushes predicates and projections into each source adapter, and lets native engines handle the work they are best at — Postgres indexes, Iceberg partition pruning, Snowflake columnar scans.

Results stream back into a single coordinator that stitches joins, applies group-by, and emits one result set. No copy step, no staging bucket, no lag window between the systems you already run and the answers your team needs.

latency/p99 · orders_last_30d.sql

p99 latency · identical query104x faster on hit

Cold query2.4s
Warm plan420ms
Partition cache96ms
Result cache23ms

3 cache layersTTL per dataset · invalidate on write

// caching

Sub-second cache hits, three layers deep.

DXData memoizes work at every tier of the planner. The result cache replays identical queries in a few milliseconds, the partition cache short-circuits scans on stable segments, and the plan cache skips parsing and optimization on hot templates.

Every layer is keyed by dataset, invalidated on writes, and tunable per workspace — so overnight refreshes and live dashboards can share the same engine without fighting each other.

time_travel.sql

1
2
3
4
5
6
7
8
9
10
11
12
13
-- query an exact snapshot by hash
SELECT *
FROM iceberg.sales.orders
FOR VERSION AS OF '8f91c3a2b04d';
 
-- or roll back by wall-clock time
SELECT COUNT(*)
FROM iceberg.sales.orders
FOR TIMESTAMP AS OF '2026-04-18 10:00:00';
 
-- or diff the state of a Nessie branch
SELECT *
FROM iceberg.sales.orders@experiment/q2-pricing;

// time-travel

Query any snapshot, any branch, any moment.

Iceberg tracks an immutable history of every commit to every table, and Nessie layers git-style branches on top of that history. The query engine speaks both — point at a snapshot hash, a wall-clock timestamp, or a named branch and the planner will resolve the exact file set.

Reproduce a bug from last Tuesday, preview a schema change on a branch, or audit which exact rows powered a regulatory report — all through SQL, no infrastructure gymnastics.

// sql.compat

ANSI SQL, plus the parts you actually wanted.

Every query you have already written — including window functions, CTEs, recursive queries, and grouping sets — runs unmodified. The engine reports the same standard error codes and planner hints as Trino.

On top of ANSI, DXData ships opinionated extensions for the work analysts actually do: MATCH_RECOGNIZE for sessionization, a geospatial toolbox, window-frame exclusions, and array and map UDFs that compile to efficient vectorized operators.

sql_compat.sql

1
2
3
4
5
6
7
8
9
-- ANSI SQL, runs anywhere
SELECT region,
       COUNT(*) AS orders,
       SUM(total) AS revenue
FROM iceberg.sales.orders
WHERE created_at >= DATE '2026-01-01'
GROUP BY region
ORDER BY revenue DESC
LIMIT 25;

// benchmarks

Numbers from a real workload.

p99 cached: 23ms
p99 cold: 2.1s
peak throughput: 12.4K QPS
scanned: 4.2B rows/day

// how it works

From keystroke to result in five stages.

Every query traverses the same deterministic pipeline — the difference between 23 milliseconds and 2 seconds is which stages can short-circuit on cached work.

Parse
ANSI SQL compiled into a validated, typed logical plan with source-aware identifiers.
Optimize
Cost-based optimizer reorders joins, pushes predicates down, and prunes partitions.
Plan
Physical plan splits work into stages across federated source adapters and workers.
Execute
Stages stream rows in parallel with adaptive parallelism and workload isolation.
Cache
Results, partitions, and plans are memoized per-dataset with automatic invalidation.

// connectors

Query from anywhere.

Twelve of the most common sources below — the full catalog spans 100+ native connectors across databases, warehouses, SaaS tools, object stores, and streams.

Iceberg

Postgres

Snowflake

BigQuery

Databricks

Redshift

ClickHouse

MongoDB

Kafka

MySQL

Delta Lake

See all 100+ connectors

// use cases

Built for the three places SQL actually runs.

BI dashboards

Back Looker, Tableau, and Superset with cached results that stay fresh through automatic invalidation.

SELECT region, SUM(revenue) FROM mart.orders GROUP BY 1;

Ad-hoc analytics

Explore raw events and production tables side-by-side without waiting on ingestion or modeling cycles.

SELECT * FROM postgres.app.users
WHERE email LIKE '%@acme.com' LIMIT 100;

Operational analytics

Embed low-latency SQL directly inside product features — search, personalization, billing rollups.

SELECT COUNT(*) FROM events
WHERE user_id = :id AND ts > NOW() - INTERVAL 1 DAY;

// faq

Questions the engineers ask first.

How is this different from Trino?

DXData runs a managed, hardened Trino with additional cost-based optimizations, a multi-layer result cache, tight Iceberg and Nessie integration, and first-class SSO, RBAC, and audit. You get the open SQL surface of Trino without owning the cluster, tuning, or upgrade cycle.

Do my queries leave my VPC?

No. You can deploy query workers in your own VPC (AWS, GCP, or Azure) while the control plane stays managed. Data plane traffic terminates inside your network and connector credentials never transit DXData infrastructure.

What about row-level security?

The engine enforces attribute-based policies at compile time. Row, column, and table predicates are injected into the optimized plan before execution, so users cannot bypass them with view rewrites or projection tricks. Every decision is logged to an immutable audit trail.

Does it support streaming / CDC?

Yes. Iceberg CDC and Kafka topics are queryable as native tables, and materialized views can incrementally refresh from either. For hard real-time, the engine exposes a streaming-SQL surface that pushes filters and aggregations down to the source.

// related capabilities

Ready when you are

See it run on your data.

Point the query engine at your warehouse and lake — no migration, no ingestion, no scheduled maintenance window required.

Start free trial Book a demo

One planner across every source you own.

Sub-second cache hits, three layers deep.

Every layer is keyed by dataset, invalidated on writes, and tunable per workspace — so overnight refreshes and live dashboards can share the same engine without fighting each other.

Query any snapshot, any branch, any moment.

Reproduce a bug from last Tuesday, preview a schema change on a branch, or audit which exact rows powered a regulatory report — all through SQL, no infrastructure gymnastics.

ANSI SQL, plus the parts you actually wanted.

From keystroke to result in five stages.

Every query traverses the same deterministic pipeline — the difference between 23 milliseconds and 2 seconds is which stages can short-circuit on cached work.

Parse

ANSI SQL compiled into a validated, typed logical plan with source-aware identifiers.

Optimize

Cost-based optimizer reorders joins, pushes predicates down, and prunes partitions.

Plan

Physical plan splits work into stages across federated source adapters and workers.

Execute

Stages stream rows in parallel with adaptive parallelism and workload isolation.

Cache

Results, partitions, and plans are memoized per-dataset with automatic invalidation.

Built for the three places SQL actually runs.

BI dashboards

Back Looker, Tableau, and Superset with cached results that stay fresh through automatic invalidation.

SELECT region, SUM(revenue) FROM mart.orders GROUP BY 1;

Ad-hoc analytics

Explore raw events and production tables side-by-side without waiting on ingestion or modeling cycles.

SELECT * FROM postgres.app.users
WHERE email LIKE '%@acme.com' LIMIT 100;

Operational analytics

Embed low-latency SQL directly inside product features — search, personalization, billing rollups.

SELECT COUNT(*) FROM events
WHERE user_id = :id AND ts > NOW() - INTERVAL 1 DAY;

Questions the engineers ask first.

How is this different from Trino?

Do my queries leave my VPC?

What about row-level security?

Does it support streaming / CDC?

Interactive SQL, everywhere your data lives.

Core value props

Federated in a single SELECT

Sub-second result cache

Time-travel to any snapshot

One planner across every source you own.

Sub-second cache hits, three layers deep.

Query any snapshot, any branch, any moment.

ANSI SQL, plus the parts you actually wanted.

Numbers from a real workload.

From keystroke to result in five stages.

Parse

Optimize

Plan

Execute

Cache

Query from anywhere.

Built for the three places SQL actually runs.

BI dashboards

Ad-hoc analytics

Operational analytics

Questions the engineers ask first.

The rest of the platform, also yours.

Data Catalog

Pipelines

Branching

See it run on your data.

Interactive SQL, everywhere your data lives.

Core value props

Federated in a single SELECT

Sub-second result cache

Time-travel to any snapshot

One planner across every source you own.

Sub-second cache hits, three layers deep.

Query any snapshot, any branch, any moment.

ANSI SQL, plus the parts you actually wanted.

Numbers from a real workload.

From keystroke to result in five stages.

Parse

Optimize

Plan

Execute

Cache

Query from anywhere.

Built for the three places SQL actually runs.

BI dashboards

Ad-hoc analytics

Operational analytics

Questions the engineers ask first.

The rest of the platform, also yours.

Data Catalog

Pipelines

Branching

See it run on your data.