Skip to content

DORA Metrics

DORA (DevOps Research and Assessment) metrics measure software delivery performance. They originated in the Accelerate research programme (Forsgren / Humble / Kim) and are maintained today at dora.dev. The CDF announcement of Oct 2025 codified the move from four metrics to the current canonical set of five.

LumenFlow calculates all five locally and — when a control plane endpoint is configured — ships them to the cloud as structured telemetry. For CLI usage, tag taxonomy, and batching semantics, see the advanced metrics guide.

DORA groups the metrics into throughput (how fast good change flows) and instability (how often change causes rework or failure).

How often the team releases code to production.

  • Formula: commits_in_window / days_in_window × 7
  • Unit: deploys per week
  • Aggregation: mean (DORA canonical — median doesn’t make sense for a rate)
  • Normalisation: LumenFlow scales the count to per-week regardless of the --days window so --days 7 and --days 30 produce comparable values.
TierThreshold
Elite> 5 / week
High1 – 5 / week
Medium0.25 – 1 / week
Low< 0.25 / week

Time from code change to running in production.

  • Formula: WU cycle time = completed_at − claimed_at
  • Unit: hours
  • Aggregation: median (DORA canonical; mean is skewed by long-tail PRs)
  • LumenFlow also reports mean and p90 so dashboards can track both central tendency and tail behaviour.
TierThreshold (median)
Elite< 24 h
High< 168 h (7 d)
Medium< 720 h (30 d)
Low≥ 720 h

Time to recover from a deploy-caused failure.

  • Formula: time between paired EMERGENCY-tagged commits (break_commit.timestamp → fix_commit.timestamp)
  • Unit: hours
  • Aggregation: median (DORA canonical)
TierThreshold (median)
Elite< 1 h
High< 24 h
Medium< 168 h (7 d)
Low≥ 168 h

Share of deployments that cause a failure.

  • Formula: failures / total_deployments × 100
  • Unit: percent
TierThreshold
Elite< 15 %
High< 30 %
Medium< 45 %
Low≥ 45 %

5th canonical metric (added by CDF, Oct 2025). Measures reactive churn — reverts and hotfixes — as a share of total deploys.

  • Formula: (revert + hotfix commits) / total_deployments × 100
  • Unit: percent
  • Deduplication: a commit matching both revert: and hotfix patterns is counted once.
TierThreshold
Elite< 5 %
High< 10 %
Medium< 20 %
Low≥ 20 %

In 2023, DORA renamed Mean Time to Recovery to Failed Deployment Recovery Time and narrowed the definition. The rationale:

  1. MTTR conflated deploy failures with unrelated incidents — infra outages, third-party degradations, and platform events were landing in the same bucket as regressions, making the metric useless as a signal about code-change quality.
  2. Mean masked long-tail incidents. A single multi-day incident would drag the mean into the medium/low tier even if the team consistently recovered from normal deploy failures in minutes. DORA standardised on median as the canonical aggregation.

FDRT is the fix: scope = deploy-caused failures only, aggregation = median.

Change Failure Rate catches failures that trigger alerts or user reports. It doesn’t catch the softer signal of teams that ship, quietly revert, and then re-ship a fix — where nothing failed visibly but a deploy effectively didn’t stick. The CDF added Deployment Rework Rate in Oct 2025 to surface this reactive churn.

Together, CFR + Rework Rate give a more complete instability picture:

  • CFR high, Rework low: failures are visible; fix culture is reactive but corrective.
  • CFR low, Rework high: silent instability — lots of “oops, revert, redo” that doesn’t register as incidents.
  • Both low: genuine stability.
  • Both high: systemic problem.

DORA canonical guidance:

MetricAggregation
Deployment Frequencymean
Lead Time for Changesmedian
Failed Deployment Recovery Timemedian
Change Failure Rateratio
Deployment Rework Rateratio

Median-over-mean matters because software delivery time distributions are heavy-tailed — one long-running PR or one stubborn incident can lift the mean into a tier that misrepresents the normal case. The median is robust to outliers and answers the question DORA research cares about: what does a typical change look like?

LumenFlow follows this exactly:

  • Lead time classification is driven by medianHours.
  • FDRT classification is driven by medianHours.
  • Both metrics still report mean and p90 alongside the median so dashboards can plot all three series.

LumenFlow computes DORA metrics from signals that exist in every repository — no CI integration required. The trade-off is that two metrics are proxies rather than ground-truth production signals:

  • Change Failure Rate uses the .lumenflow/skip-gates-audit.ndjson log as the failure signal. This catches “we shipped despite failing gates” but misses actual production incidents that never touched skip-gates.
  • Failed Deployment Recovery Time pairs consecutive commits containing the EMERGENCY token or fix(EMERGENCY) scope. It catches deploy-failure pairs tagged by the team; it misses recoveries that weren’t explicitly tagged.

Upgrading these to true production incident signals requires CI-side event integration (webhook from the incident tool, annotated deploy events) and is tracked as a follow-up WU — out of scope for the local proxy implementation.