Skip to content

Flow Metrics & Analytics

LumenFlow tracks flow metrics to help teams identify bottlenecks and improve delivery performance.

The @lumenflow/metrics package provides:

  • DORA Metrics — The dora.dev 2025 canonical 5-metric model
  • Flow Analysis — Bottleneck detection and critical path calculation
  • Telemetry — Event emission for local NDJSON logs and cloud sync

DORA Metrics (2025 canonical 5-metric model)

Section titled “DORA Metrics (2025 canonical 5-metric model)”

LumenFlow tracks the five metrics defined by dora.dev and refreshed by the CDF Oct 2025 announcement. Aggregation follows DORA canonical guidance: mean for deployment frequency, median for lead time and FDRT.

MetricGroupFormulaUnitAggregationTarget
Deployment FrequencyThroughputcommits_in_window / days_in_window * 7/weekmeanDaily → weekly
Lead Time for ChangesThroughputWU cycle time = completed_at − claimed_athoursmedian< 24h
Failed Deployment Recovery (FDRT)Throughputmedian(time between paired EMERGENCY commits)hoursmedian< 1h
Change Failure Rate (CFR)Instabilityfailures / total_deployments * 100%ratio< 15%
Deployment Rework RateInstability(revert + hotfix commits) / total_deployments * 100%ratio< 5%
pnpm metrics:snapshot                 # All metrics, JSON output
pnpm metrics:snapshot --type dora     # DORA metrics only
pnpm metrics:snapshot --days 30       # 30-day window (normalised to per-week)
pnpm metrics:snapshot --dry-run       # Preview; no NDJSON written, no cloud sync

Example output:

DORA METRICS (2025 canonical 5-metric model)
Deployment Frequency: 6/week (elite)
Lead Time: 12h median (elite)
Failed Deployment Recovery Time: 0.5h median (elite)
Change Failure Rate: 8% (elite)
Deployment Rework Rate: 3% (elite)

When a workspace has a control_plane endpoint configured, DORA records are shipped to POST <endpoint>/api/v1/telemetry in batches of up to 1000 records. A typical metrics:snapshot run emits 5 records and fits in a single batch.

Records are first appended to .lumenflow/telemetry/dora.ndjson, then the cloud sync worker reads from the persisted cursor offset and posts batched payloads. This gives offline resilience: retries resume from the last acknowledged offset.

Every record carries a tags bag the control plane can slice dashboards by. Values are primitive (string | number | boolean); missing values are omitted rather than emitted as empty strings.

TagSourceExample
source_typeHard-coded"dora"
calculated_byHard-coded"metrics:snapshot"
tierPer-metric classification"elite"
repogit config --get remote.origin.url → parsed owner/repo"hellmai/lumenflow"
branchgit rev-parse --abbrev-ref HEAD"lane/framework-metrics/wu-2635"
commit_shagit rev-parse HEAD"deadbeef…"
serviceworkspace.yamlservice (or software_delivery.service)"control-plane"
environmentworkspace.yamlenvironment, fallback LUMENFLOW_ENV"prod"
snapshot_window--days flag"7d", "30d"
pipelineCI_PIPELINE_NAME, fallback GITHUB_WORKFLOW"main-ci"
deploy_targetDEPLOY_TARGET"prod-eu"
workflow_run_idGITHUB_RUN_ID, fallback CI_PIPELINE_ID"987654"

Lead time and FDRT records additionally carry aggregation: "median", mean_hours, and p90_hours so trend dashboards can plot all three aggregations without re-running the CLI. CFR records carry failures + total_deployments; Deployment Rework Rate carries rework_commits + total_deployments.

pnpm cloud:connect                        # Interactive OAuth + workspace.yaml scaffolding
pnpm config:get --key control_plane       # Verify endpoint + sync_interval
pnpm metrics:snapshot                     # Emits NDJSON + triggers cloud sync when configured

See Workspace spec for the full control_plane schema.

pnpm flow:bottlenecks

This analyzes your WU flow to identify:

  • Lane Congestion — Lanes exceeding WIP limits
  • Blocked WUs — Work units waiting on dependencies
  • Stale WUs — WUs in progress for too long
  • Critical Path — WUs blocking the most downstream work

Capture point-in-time metrics for dashboards or CI:

pnpm metrics:snapshot                    # Full snapshot, writes .lumenflow/snapshots/metrics-latest.json
pnpm metrics:snapshot --type dora        # DORA only
pnpm metrics:snapshot --days 30          # 30-day window; value still reported per-week

LumenFlow emits structured NDJSON telemetry under .lumenflow/telemetry/:

FilePurpose
.lumenflow/telemetry/gates.ndjsonGate execution events (duration, pass/fail, WU, lane)
.lumenflow/flow.logWU lifecycle events (wu:claim, wu:prep, wu:done)
.lumenflow/telemetry/dora.ndjsonDORA metric records with canonical tag bag
.lumenflow/telemetry/costs.ndjsonLLM cost events (model, tokens, USD)
.lumenflow/telemetry/llm-classification.ndjsonLLM classification lifecycle events
CommandDescription
pnpm metrics:snapshotCapture 5-metric DORA snapshot; emits NDJSON + syncs to cloud
pnpm flow:reportGenerate DORA + gate + WU flow report
pnpm flow:bottlenecksIdentify workflow bottlenecks and critical path
  1. Review metrics weekly

    Schedule a weekly review of flow metrics to identify trends before they become problems.

  2. Set WIP limits appropriately

    If a lane is consistently at 100%+ capacity, consider splitting the lane, adding capacity, or reducing WU scope.

  3. Address blockers quickly

    Blocked WUs create cascading delays. Prioritize unblocking over new work.

  4. Track trends, not absolutes

    DORA research emphasises continuous improvement over hitting specific numbers. Watch the slope, not the intercept.