Back to portfolio

dbt lineage impact analyzer

CLI tool that parses dbt manifest.json to produce a human-readable impact report for any model change, showing all downstream dependents and their SLA tiers.

Tech stack

PythondbtGitHub ActionsClickNetworkXJinja2

Problem

In a dbt monorepo with 400+ models, engineers had no quick way to understand the blast radius of a change before opening a PR. A breaking change to a widely-used staging model could cascade to 80+ downstream models and 12 dashboards.

What I built

A Python CLI that takes a dbt manifest.json and a list of changed model names, then produces:

  • Lineage graph: directed acyclic graph rendered as a compact text tree or exported as JSON/DOT format for Graphviz.
  • Impact tiers: models annotated with their SLA tier (real-time, hourly, daily, best-effort) from a YAML config file; output grouped by tier so engineers know which failures are business-critical.
  • Exposure linkage: shows which dbt exposures (dashboards, ML features, downstream APIs) depend on each changed model.
  • CI integration: runs in GitHub Actions on every PR; adds a comment with the impact summary if the blast radius exceeds a configurable threshold (default: 10 downstream models).

Results

Adopted by both data engineering and analytics engineering teams within a week of release. Blocked two high-impact breaking changes from reaching production in the first month.