Data catalog

Know your data. Trust your data.

Auto-inferred schemas, column-level lineage, searchable tags, and git-style change history across every table in your stack.

Start free See the catalog in action

catalog/analytics.orders

iceberg · sales

analytics.orders

live

piifinancedailytier-1

order_idPK
uuid
customer_emailPII
varchar(320)
region
varchar(24)
total_cents
bigint
shipped_atNULL
timestamptz

Dana Nkosidata-platform

98quality

14consumers

SOC2policy

// the problem

If you cannot find it, you cannot trust it.

Every data team has the same ghost story: the dashboard that quietly uses a deprecated column, the pipeline built on a table nobody owns, the compliance review that turns into a month of Slack archaeology. The cost is not just wasted time — it is confidence in the numbers.

DXData's catalog closes the loop. Every table is documented automatically, every change is auditable, and every column can be traced end-to-end. Search replaces tribal knowledge.

// three promises

Documented, lineage-aware, and versioned.

Auto-discovery

Every table, column, and constraint catalogued the moment it lands.

Iceberg, Postgres, S3, Kafka — all indexed
Schema inference with type hints
No manual yaml, ever

Learn more

Column-level lineage

Trace any column back to the source and forward to every consumer.

Parses SQL + YAML automatically
Impact analysis before migrations
Graph API for tooling

Learn more

Git-style history

Every schema change is a commit, every rollback a revert.

Diff schemas across any two commits
Tag-based release cuts
Auditable change log

Learn more

catalog/browse

8 of 1,284 tables·indexed 2m ago

analytics.ordersiceberg
piifinance
Dana N.
crm.customerspostgres
piitier-1
Rami K.
marketing.eventss3
streamingpii-hashed
Priya S.
finance.invoicesiceberg
financequarterly
Leo M.
product.sessionskafka
streamingraw
Anya V.
ops.incidentspostgres
internalops
Jordan P.
support.ticketspostgres
piisupport
Mei L.
ml.features_v3iceberg
mltier-2
Sam O.

// catalog.auto

A living inventory of every table in your stack.

Point DXData at Iceberg, Postgres, S3, Kafka, or any of our 100+ connectors and the catalog indexes every table, view, and topic — including partitions, constraints, and inferred types.

Metadata refreshes on a schedule you control. New tables appear within minutes of creation, and dropped tables are preserved in history so nothing gets lost.

Indexes schema, stats, partitions, and constraints
Type inference with confidence hints
Works across batch, streaming, and lake sources

Read the discovery docs

lineage/daily_revenue

12 column edges · parsed from SQL + YAMLdepth: 3 hops

// lineage.column-level

Trace any column across every hop.

DXData parses your SQL, dbt models, and pipeline YAML to build a true column-level lineage graph — no manual annotation, no tag-before-you-ship gate.

Click any column to see every upstream source that feeds it and every downstream dashboard, model, or pipeline that depends on it. Impact analysis becomes a first-class operation.

Parses SQL, YAML, and dbt manifests automatically
Graph API for custom tooling and CI checks
Survives renames, casts, and CTEs

Explore lineage

catalog/search

orders42 results · 18ms

owner: data-platformtag: piifreshness: <1htier: 1

analytics.orders2m ago
Canonical orders table — one row per order, refreshed every 15 minutes.
finance.orders_daily1h ago
Daily-grain summary of orders used in the finance close report.
ml.orders_features12m ago
Feature-engineered view of orders with 42 derived columns.

// catalog.search

Typed search that finds the right table first.

Filter by owner, tag, freshness, tier, or any combination. Chips compose as a typed query — the same query you can save, share, or bookmark as a view.

Trending tables and starred views help new hires find the canonical answer instead of guessing. Every search is indexed in sub-100ms.

Typed query language with autocomplete
Saved views with link sharing
"Trending tables" per team

See the search syntax

catalog/analytics.orders#about

Dana Nkosiowner · data-platform

piipii-hashedfinancequarterly

README.md

Orders — canonical fact table

One row per confirmed order. Refreshed every 15 minutes from the upstream CDC stream. Use finance.orders_daily for daily-grain reporting.

Conventions

customer_email is always lowercased
total_cents is in USD — never float
shipped_at is nullable for unfulfilled orders

// docs.as-code

Docs that live next to the data.

Every table has an owner badge, tag set, and Markdown README rendered inline. Docs live in your git repo alongside the transforms they describe — so review flows you already use cover catalog changes too.

Tag taxonomies are optional but supported: bring your own ontology, enforce allowed values, or leave it free-form.

Owner + on-call surfaced on every entry
Markdown READMEs rendered in-app
YAML taxonomies for enforced tagging

Adopt docs-as-code

catalog/analytics.orders#quality

Freshness

98%<15m lag

Completeness

99.4%0 nulls in PK

Uniqueness

100%no dupes

Schema stability

7d cleanno drift

// quality.scored

A quality score you can actually act on.

Freshness, completeness, uniqueness, and schema stability are scored continuously for every table. Thresholds are configurable per tier — a tier-1 dashboard gets paged, a scratch table does not.

Scores come with trend sparklines so you can see a regression the moment it starts, not the Monday morning it ships.

Four canonical metrics, per-column where it matters
Thresholds per tier and per tag
Alerts feed into PagerDuty, Slack, and OpsGenie

Configure quality rules

history/analytics.orders · commit 8f3a2c1

Dana Nkosi·add region, rename order_total, drop promo_code·3 days ago

  order_id        uuid        PK
  customer_email  varchar(320) PII
+ region          varchar(24)
~ total_cents    bigint      (was: order_total)
- promo_code     varchar(40)
  shipped_at     timestamptz

// history.commits

Every schema change is a commit.

When a column is added, renamed, or dropped, the catalog records it as a commit — with author, message, and diff. Rollbacks are a revert, not an incident post-mortem.

Pair this with branches and you get a full write-audit-publish loop for your data shape, not just your data values.

Author, message, and diff for every change
Diff any two commits, any two branches
Immutable audit trail for compliance

Read the history docs

// catalog.branches

Catalog entries version with your branches.

Powered by Nessie, a table's state on feature/new-metric can differ from its state on main — new columns, new tags, new docs — all previewable before anything lands in production.

main

analytics.orders

schema v3.418 cols

production

feature/new-metric

analytics.orders

schema v3.4-dev.119 cols+1

+ margin_cents bigint (preview)

// ecosystem

Plays nicely with the catalogs you already run.

DXData exports and imports OpenMetadata, DataHub, and Amundsen payloads natively. Keep your existing index or adopt DXData as the source of truth — your call.

Amundsen

OpenMetadata

Atlan

Collibra

DataHub

Marquez

// use cases

Where the catalog earns its keep.

Data engineer onboarding

New hires can find the canonical table, its owner, and its README without pinging anyone on Slack.

Readable docs next to every schema
Owner + on-call directly in the entry
Starred tables become a guided tour

Impact analysis before migrations

Before you drop a column, see every downstream dashboard, model, and pipeline that depends on it.

Column-level blast radius
Notify consumers from the UI
Preview the change on a branch first

PII audit for compliance

Filter to every column tagged PII across every source — and prove who can see what.

One query for the SOC 2 auditor
Hashed vs raw PII split out
Access policies linked inline

// faq

Frequently asked.

How long does initial catalog indexing take?

Most stacks finish a full index pass in under an hour. The catalog streams entries as it goes, so your team can start searching within minutes of connecting the first source.

Can I bring my own tags and taxonomy?

Yes. Tags are free-form by default, or you can upload a YAML taxonomy that enforces allowed values and ownership. Existing taxonomies from OpenMetadata, Amundsen, or Atlan import cleanly.

What about private fields I do not want indexed?

Mark columns, schemas, or whole sources as private and they are excluded from indexing entirely — not hidden with policy, never read. Row-level filters and masked previews are also available when you do want them indexed but gated.

Does it support streaming tables?

Yes. Kafka, Kinesis, and Pulsar topics appear alongside batch tables with their own freshness and throughput signals, and lineage carries through streaming SQL the same way it does for batch.

// keep exploring

Governance

Row, column, and table-level controls with a full audit trail.

Learn more

Branching

Zero-copy branches of your catalog, powered by Nessie.

Learn more

Observability

Freshness, quality, and lineage signals, everywhere.

Learn more

Ship faster, own your data

Modern data platform, no vendor lock-in.

Everything in DXData runs on open standards you can walk away with — Iceberg tables, Nessie history, standard SQL.

Start free trial Book a demo

Data catalog

Know your data. Trust your data.

Auto-inferred schemas, column-level lineage, searchable tags, and git-style change history across every table in your stack.

Start free See the catalog in action

catalog/analytics.orders

iceberg · sales

analytics.orders

live

piifinancedailytier-1

order_idPK
uuid
customer_emailPII
varchar(320)
region
varchar(24)
total_cents
bigint
shipped_atNULL
timestamptz

Dana Nkosidata-platform

98quality

14consumers

SOC2policy

// the problem

If you cannot find it, you cannot trust it.

DXData's catalog closes the loop. Every table is documented automatically, every change is auditable, and every column can be traced end-to-end. Search replaces tribal knowledge.

// three promises

Documented, lineage-aware, and versioned.

Auto-discovery

Every table, column, and constraint catalogued the moment it lands.

Iceberg, Postgres, S3, Kafka — all indexed
Schema inference with type hints
No manual yaml, ever

Learn more

Column-level lineage

Trace any column back to the source and forward to every consumer.

Parses SQL + YAML automatically
Impact analysis before migrations
Graph API for tooling

Learn more

Git-style history

Every schema change is a commit, every rollback a revert.

Diff schemas across any two commits
Tag-based release cuts
Auditable change log

Learn more

catalog/browse

8 of 1,284 tables·indexed 2m ago

analytics.ordersiceberg
piifinance
Dana N.
crm.customerspostgres
piitier-1
Rami K.
marketing.eventss3
streamingpii-hashed
Priya S.
finance.invoicesiceberg
financequarterly
Leo M.
product.sessionskafka
streamingraw
Anya V.
ops.incidentspostgres
internalops
Jordan P.
support.ticketspostgres
piisupport
Mei L.
ml.features_v3iceberg
mltier-2
Sam O.

// catalog.auto

A living inventory of every table in your stack.

Point DXData at Iceberg, Postgres, S3, Kafka, or any of our 100+ connectors and the catalog indexes every table, view, and topic — including partitions, constraints, and inferred types.

Metadata refreshes on a schedule you control. New tables appear within minutes of creation, and dropped tables are preserved in history so nothing gets lost.

Indexes schema, stats, partitions, and constraints
Type inference with confidence hints
Works across batch, streaming, and lake sources

Read the discovery docs

lineage/daily_revenue

12 column edges · parsed from SQL + YAMLdepth: 3 hops

// lineage.column-level

Trace any column across every hop.

DXData parses your SQL, dbt models, and pipeline YAML to build a true column-level lineage graph — no manual annotation, no tag-before-you-ship gate.

Click any column to see every upstream source that feeds it and every downstream dashboard, model, or pipeline that depends on it. Impact analysis becomes a first-class operation.

Parses SQL, YAML, and dbt manifests automatically
Graph API for custom tooling and CI checks
Survives renames, casts, and CTEs

Explore lineage

catalog/search

orders42 results · 18ms

owner: data-platformtag: piifreshness: <1htier: 1

analytics.orders2m ago
Canonical orders table — one row per order, refreshed every 15 minutes.
finance.orders_daily1h ago
Daily-grain summary of orders used in the finance close report.
ml.orders_features12m ago
Feature-engineered view of orders with 42 derived columns.

// catalog.search

Typed search that finds the right table first.

Filter by owner, tag, freshness, tier, or any combination. Chips compose as a typed query — the same query you can save, share, or bookmark as a view.

Trending tables and starred views help new hires find the canonical answer instead of guessing. Every search is indexed in sub-100ms.

Typed query language with autocomplete
Saved views with link sharing
"Trending tables" per team

See the search syntax

catalog/analytics.orders#about

Dana Nkosiowner · data-platform

piipii-hashedfinancequarterly

README.md

Orders — canonical fact table

One row per confirmed order. Refreshed every 15 minutes from the upstream CDC stream. Use finance.orders_daily for daily-grain reporting.

Conventions

customer_email is always lowercased
total_cents is in USD — never float
shipped_at is nullable for unfulfilled orders

// docs.as-code

Docs that live next to the data.

Tag taxonomies are optional but supported: bring your own ontology, enforce allowed values, or leave it free-form.

Owner + on-call surfaced on every entry
Markdown READMEs rendered in-app
YAML taxonomies for enforced tagging

Adopt docs-as-code

catalog/analytics.orders#quality

Freshness

98%<15m lag

Completeness

99.4%0 nulls in PK

Uniqueness

100%no dupes

Schema stability

7d cleanno drift

// quality.scored

A quality score you can actually act on.

Freshness, completeness, uniqueness, and schema stability are scored continuously for every table. Thresholds are configurable per tier — a tier-1 dashboard gets paged, a scratch table does not.

Scores come with trend sparklines so you can see a regression the moment it starts, not the Monday morning it ships.

Four canonical metrics, per-column where it matters
Thresholds per tier and per tag
Alerts feed into PagerDuty, Slack, and OpsGenie

Configure quality rules

history/analytics.orders · commit 8f3a2c1

Dana Nkosi·add region, rename order_total, drop promo_code·3 days ago

  order_id        uuid        PK
  customer_email  varchar(320) PII
+ region          varchar(24)
~ total_cents    bigint      (was: order_total)
- promo_code     varchar(40)
  shipped_at     timestamptz

// history.commits

Every schema change is a commit.

When a column is added, renamed, or dropped, the catalog records it as a commit — with author, message, and diff. Rollbacks are a revert, not an incident post-mortem.

Pair this with branches and you get a full write-audit-publish loop for your data shape, not just your data values.

Author, message, and diff for every change
Diff any two commits, any two branches
Immutable audit trail for compliance

Read the history docs

// catalog.branches

Catalog entries version with your branches.

Powered by Nessie, a table's state on feature/new-metric can differ from its state on main — new columns, new tags, new docs — all previewable before anything lands in production.

main

analytics.orders

schema v3.418 cols

production

feature/new-metric

analytics.orders

schema v3.4-dev.119 cols+1

+ margin_cents bigint (preview)

// ecosystem

Plays nicely with the catalogs you already run.

DXData exports and imports OpenMetadata, DataHub, and Amundsen payloads natively. Keep your existing index or adopt DXData as the source of truth — your call.

Amundsen

OpenMetadata

Atlan

Collibra

DataHub

Marquez

// use cases

Where the catalog earns its keep.

Data engineer onboarding

New hires can find the canonical table, its owner, and its README without pinging anyone on Slack.

Readable docs next to every schema
Owner + on-call directly in the entry
Starred tables become a guided tour

Impact analysis before migrations

Before you drop a column, see every downstream dashboard, model, and pipeline that depends on it.

Column-level blast radius
Notify consumers from the UI
Preview the change on a branch first

PII audit for compliance

Filter to every column tagged PII across every source — and prove who can see what.

One query for the SOC 2 auditor
Hashed vs raw PII split out
Access policies linked inline

// faq

Frequently asked.

How long does initial catalog indexing take?

Most stacks finish a full index pass in under an hour. The catalog streams entries as it goes, so your team can start searching within minutes of connecting the first source.

Can I bring my own tags and taxonomy?

Yes. Tags are free-form by default, or you can upload a YAML taxonomy that enforces allowed values and ownership. Existing taxonomies from OpenMetadata, Amundsen, or Atlan import cleanly.

What about private fields I do not want indexed?

Does it support streaming tables?

Yes. Kafka, Kinesis, and Pulsar topics appear alongside batch tables with their own freshness and throughput signals, and lineage carries through streaming SQL the same way it does for batch.

// keep exploring

Ship faster, own your data

Modern data platform, no vendor lock-in.

Everything in DXData runs on open standards you can walk away with — Iceberg tables, Nessie history, standard SQL.

Start free trial Book a demo

Know your data. Trust your data.

analytics.orders

If you cannot find it, you cannot trust it.

Documented, lineage-aware, and versioned.

Auto-discovery

Column-level lineage

Git-style history

A living inventory of every table in your stack.

Trace any column across every hop.

Typed search that finds the right table first.

Orders — canonical fact table

Docs that live next to the data.

A quality score you can actually act on.

Every schema change is a commit.

Catalog entries version with your branches.

analytics.orders

analytics.orders

Plays nicely with the catalogs you already run.

Where the catalog earns its keep.

Data engineer onboarding

Impact analysis before migrations

PII audit for compliance

Frequently asked.

Related capabilities.

Governance

Branching

Observability

Modern data platform, no vendor lock-in.

Know your data. Trust your data.

analytics.orders

If you cannot find it, you cannot trust it.

Documented, lineage-aware, and versioned.

Auto-discovery

Column-level lineage

Git-style history

A living inventory of every table in your stack.

Trace any column across every hop.

Typed search that finds the right table first.

Orders — canonical fact table

Docs that live next to the data.

A quality score you can actually act on.

Every schema change is a commit.

Catalog entries version with your branches.

analytics.orders

analytics.orders

Plays nicely with the catalogs you already run.

Where the catalog earns its keep.

Data engineer onboarding

Impact analysis before migrations

PII audit for compliance

Frequently asked.

Related capabilities.

Governance

Branching

Observability

Modern data platform, no vendor lock-in.