Quickstart10 min read

From zero to a federated query

You'll install the CLI, authenticate, register a data source, run a query that joins your warehouse with an external Postgres, and create a branch — all in under ten minutes.

Install the CLI

The DXData CLI is a single static binary with no runtime dependencies. The installer detects your platform and drops the binary into /usr/local/bin (or the equivalent on Windows).

Prefer a package manager? We publish formulas for Homebrew, apt, and winget. See the CLI reference for alternate install paths.

terminal

# macOS / Linux
curl -fsSL https://get.dxdata.io | sh
 
# Then verify
dxdata --version

Next:Authenticate

Authenticate

dxdata loginkicks off an OIDC device-code flow. You'll see a short code in your terminal and a browser window opens to your workspace's login page. Paste the code, approve the session, and you're done.

Tokens are stored in your OS keychain — never on disk as plaintext — and refreshed automatically. CI/CD environments should use workspace API keys instead; see the API authentication guide.

terminal

dxdata login
# Opens a browser window.
# Paste the short code shown in your terminal.

Next:Connect a data source

Connect your first data source

DXData federates queries across external systems. Describe a source once, grant the necessary read credentials, and the engine's optimizer handles predicate pushdown and parallel scans for you.

Save the YAML above and apply it with dxdata apply ~/.dxdata/sources/analytics-pg.yaml. Sources live in your workspace config — colleagues with access will see them too.

~/.dxdata/sources/analytics-pg.yaml

# ~/.dxdata/sources/analytics-pg.yaml
kind: source
name: analytics-pg
type: postgresql
host: analytics.internal
port: 5432
database: warehouse
auth:
  secret: pg-analytics-ro

Next:Run your first query

Run your first query

Every identifier in DXData starts with a catalog. lake is your native Iceberg catalog; the source you just registered is queryable as analytics-pg. Joins across them Just Work — no extracts, no replication.

Run it with dxdata query --file query.sql or paste it into the Worksheets UI. The planner will show you which sub-query runs where.

query.sql

-- Your first cross-source query
SELECT c.region,
       COUNT(*) AS events_today
FROM lake.events e
JOIN analytics-pg.public.customers c
  ON c.id = e.customer_id
WHERE e.ts >= CURRENT_DATE
GROUP BY c.region
ORDER BY events_today DESC;

Next:Create a data branch

Create a data branch

Branches in DXData are powered by Nessie and work like Git branches for your catalog. Every commit, table create, and schema change is a reversible operation scoped to a named branch.

This is the foundation of every safe migration, dbt-style dev workflow, and incident recovery flow you'll build on top of DXData. Read more in Core concepts.

terminal

# Create a branch off main
dxdata branch create exp/region-cohort --from main
 
# Run a destructive-looking migration safely
dxdata query --branch exp/region-cohort \
  --sql "CREATE TABLE lake.region_cohort AS ..."
 
# Merge when you are happy
dxdata branch merge exp/region-cohort --into main

What's next?

Core conceptsThe mental model behind branches, snapshots, and federation.Read more SQL referenceEvery statement, function, and data type, with runnable examples.Browse reference REST APIAutomate every CLI action with the OpenAPI-described REST endpoint.Open API docs

Install the CLI

The DXData CLI is a single static binary with no runtime dependencies. The installer detects your platform and drops the binary into /usr/local/bin (or the equivalent on Windows).

Prefer a package manager? We publish formulas for Homebrew, apt, and winget. See the CLI reference for alternate install paths.

terminal

# macOS / Linux
curl -fsSL https://get.dxdata.io | sh
 
# Then verify
dxdata --version

Authenticate

Tokens are stored in your OS keychain — never on disk as plaintext — and refreshed automatically. CI/CD environments should use workspace API keys instead; see the API authentication guide.

terminal

dxdata login
# Opens a browser window.
# Paste the short code shown in your terminal.

Connect your first data source

DXData federates queries across external systems. Describe a source once, grant the necessary read credentials, and the engine's optimizer handles predicate pushdown and parallel scans for you.

Save the YAML above and apply it with dxdata apply ~/.dxdata/sources/analytics-pg.yaml. Sources live in your workspace config — colleagues with access will see them too.

~/.dxdata/sources/analytics-pg.yaml

# ~/.dxdata/sources/analytics-pg.yaml
kind: source
name: analytics-pg
type: postgresql
host: analytics.internal
port: 5432
database: warehouse
auth:
  secret: pg-analytics-ro

Run your first query

Run it with dxdata query --file query.sql or paste it into the Worksheets UI. The planner will show you which sub-query runs where.

query.sql

-- Your first cross-source query
SELECT c.region,
       COUNT(*) AS events_today
FROM lake.events e
JOIN analytics-pg.public.customers c
  ON c.id = e.customer_id
WHERE e.ts >= CURRENT_DATE
GROUP BY c.region
ORDER BY events_today DESC;

Create a data branch

Branches in DXData are powered by Nessie and work like Git branches for your catalog. Every commit, table create, and schema change is a reversible operation scoped to a named branch.

This is the foundation of every safe migration, dbt-style dev workflow, and incident recovery flow you'll build on top of DXData. Read more in Core concepts.

terminal

# Create a branch off main
dxdata branch create exp/region-cohort --from main
 
# Run a destructive-looking migration safely
dxdata query --branch exp/region-cohort \
  --sql "CREATE TABLE lake.region_cohort AS ..."
 
# Merge when you are happy
dxdata branch merge exp/region-cohort --into main