DORA Metrics

Name: LumenFlow
Author: The LumenFlow Project

DORA (DevOps Research and Assessment) metrics measure software delivery performance. They originated in the Accelerate research programme (Forsgren / Humble / Kim) and are maintained today at dora.dev. The CDF announcement of Oct 2025 codified the move from four metrics to the current canonical set of five.

LumenFlow calculates all five locally and — when a control plane endpoint is configured — ships them to the cloud as structured telemetry. For CLI usage, tag taxonomy, and batching semantics, see the advanced metrics guide.

The five metrics

DORA groups the metrics into throughput (how fast good change flows) and instability (how often change causes rework or failure).

Throughput

Deployment Frequency

How often the team releases code to production.

Formula: commits_in_window / days_in_window × 7
Unit: deploys per week
Aggregation: mean (DORA canonical — median doesn’t make sense for a rate)
Normalisation: LumenFlow scales the count to per-week regardless of the --days window so --days 7 and --days 30 produce comparable values.

Tier	Threshold
Elite	> 5 / week
High	1 – 5 / week
Medium	0.25 – 1 / week
Low	< 0.25 / week

Lead Time for Changes

Time from code change to running in production.

Formula: WU cycle time = completed_at − claimed_at
Unit: hours
Aggregation: median (DORA canonical; mean is skewed by long-tail PRs)
LumenFlow also reports mean and p90 so dashboards can track both central tendency and tail behaviour.

Tier	Threshold (median)
Elite	< 24 h
High	< 168 h (7 d)
Medium	< 720 h (30 d)
Low	≥ 720 h

Failed Deployment Recovery Time (FDRT)

Time to recover from a deploy-caused failure.

Formula: time between paired EMERGENCY-tagged commits (break_commit.timestamp → fix_commit.timestamp)
Unit: hours
Aggregation: median (DORA canonical)

Tier	Threshold (median)
Elite	< 1 h
High	< 24 h
Medium	< 168 h (7 d)
Low	≥ 168 h

Instability

Change Failure Rate

Share of deployments that cause a failure.

Formula: failures / total_deployments × 100
Unit: percent

Tier	Threshold
Elite	< 15 %
High	< 30 %
Medium	< 45 %
Low	≥ 45 %

Deployment Rework Rate

5th canonical metric (added by CDF, Oct 2025). Measures reactive churn — reverts and hotfixes — as a share of total deploys.

Formula: (revert + hotfix commits) / total_deployments × 100
Unit: percent
Deduplication: a commit matching both revert: and hotfix patterns is counted once.

Tier	Threshold
Elite	< 5 %
High	< 10 %
Medium	< 20 %
Low	≥ 20 %

Why FDRT replaced MTTR

In 2023, DORA renamed Mean Time to Recovery to Failed Deployment Recovery Time and narrowed the definition. The rationale:

MTTR conflated deploy failures with unrelated incidents — infra outages, third-party degradations, and platform events were landing in the same bucket as regressions, making the metric useless as a signal about code-change quality.
Mean masked long-tail incidents. A single multi-day incident would drag the mean into the medium/low tier even if the team consistently recovered from normal deploy failures in minutes. DORA standardised on median as the canonical aggregation.

FDRT is the fix: scope = deploy-caused failures only, aggregation = median.

Why Deployment Rework Rate was added

Change Failure Rate catches failures that trigger alerts or user reports. It doesn’t catch the softer signal of teams that ship, quietly revert, and then re-ship a fix — where nothing failed visibly but a deploy effectively didn’t stick. The CDF added Deployment Rework Rate in Oct 2025 to surface this reactive churn.

Together, CFR + Rework Rate give a more complete instability picture:

CFR high, Rework low: failures are visible; fix culture is reactive but corrective.
CFR low, Rework high: silent instability — lots of “oops, revert, redo” that doesn’t register as incidents.
Both low: genuine stability.
Both high: systemic problem.

Mean vs median aggregation

DORA canonical guidance:

Metric	Aggregation
Deployment Frequency	mean
Lead Time for Changes	median
Failed Deployment Recovery Time	median
Change Failure Rate	ratio
Deployment Rework Rate	ratio

Median-over-mean matters because software delivery time distributions are heavy-tailed — one long-running PR or one stubborn incident can lift the mean into a tier that misrepresents the normal case. The median is robust to outliers and answers the question DORA research cares about: what does a typical change look like?

LumenFlow follows this exactly:

Lead time classification is driven by medianHours.
FDRT classification is driven by medianHours.
Both metrics still report mean and p90 alongside the median so dashboards can plot all three series.

Local proxies in LumenFlow OSS

LumenFlow computes DORA metrics from signals that exist in every repository — no CI integration required. The trade-off is that two metrics are proxies rather than ground-truth production signals:

Change Failure Rate uses the .lumenflow/skip-gates-audit.ndjson log as the failure signal. This catches “we shipped despite failing gates” but misses actual production incidents that never touched skip-gates.
Failed Deployment Recovery Time pairs consecutive commits containing the EMERGENCY token or fix(EMERGENCY) scope. It catches deploy-failure pairs tagged by the team; it misses recoveries that weren’t explicitly tagged.

Upgrading these to true production incident signals requires CI-side event integration (webhook from the incident tool, annotated deploy events) and is tracked as a follow-up WU — out of scope for the local proxy implementation.