Flow Metrics & Analytics

Name: LumenFlow
Author: The LumenFlow Project

LumenFlow tracks flow metrics to help teams identify bottlenecks and improve delivery performance.

Overview

The @lumenflow/metrics package provides:

DORA Metrics — The dora.dev 2025 canonical 5-metric model
Flow Analysis — Bottleneck detection and critical path calculation
Telemetry — Event emission for local NDJSON logs and cloud sync

DORA Metrics (2025 canonical 5-metric model)

LumenFlow tracks the five metrics defined by dora.dev and refreshed by the CDF Oct 2025 announcement. Aggregation follows DORA canonical guidance: mean for deployment frequency, median for lead time and FDRT.

Metric	Group	Formula	Unit	Aggregation	Target
Deployment Frequency	Throughput	`commits_in_window / days_in_window * 7`	/week	mean	Daily → weekly
Lead Time for Changes	Throughput	WU cycle time = completed_at − claimed_at	hours	median	< 24h
Failed Deployment Recovery (FDRT)	Throughput	median(time between paired `EMERGENCY` commits)	hours	median	< 1h
Change Failure Rate (CFR)	Instability	`failures / total_deployments * 100`	%	ratio	< 15%
Deployment Rework Rate	Instability	`(revert + hotfix commits) / total_deployments * 100`	%	ratio	< 5%

Viewing DORA Metrics

pnpm metrics:snapshot                 # All metrics, JSON output
pnpm metrics:snapshot --type dora     # DORA metrics only
pnpm metrics:snapshot --days 30       # 30-day window (normalised to per-week)
pnpm metrics:snapshot --dry-run       # Preview; no NDJSON written, no cloud sync

Example output:

DORA METRICS (2025 canonical 5-metric model)
Deployment Frequency: 6/week (elite)
Lead Time: 12h median (elite)
Failed Deployment Recovery Time: 0.5h median (elite)
Change Failure Rate: 8% (elite)
Deployment Rework Rate: 3% (elite)

Cloud Sync

When a workspace has a control_plane endpoint configured, DORA records are shipped to POST <endpoint>/api/v1/telemetry in batches of up to 1000 records. A typical metrics:snapshot run emits 5 records and fits in a single batch.

NDJSON staging

Records are first appended to .lumenflow/telemetry/dora.ndjson, then the cloud sync worker reads from the persisted cursor offset and posts batched payloads. This gives offline resilience: retries resume from the last acknowledged offset.

Tag taxonomy

Every record carries a tags bag the control plane can slice dashboards by. Values are primitive (string | number | boolean); missing values are omitted rather than emitted as empty strings.

Tag	Source	Example
`source_type`	Hard-coded	`"dora"`
`calculated_by`	Hard-coded	`"metrics:snapshot"`
`tier`	Per-metric classification	`"elite"`
`repo`	`git config --get remote.origin.url` → parsed `owner/repo`	`"hellmai/lumenflow"`
`branch`	`git rev-parse --abbrev-ref HEAD`	`"lane/framework-metrics/wu-2635"`
`commit_sha`	`git rev-parse HEAD`	`"deadbeef…"`
`service`	`workspace.yaml` → `service` (or `software_delivery.service`)	`"control-plane"`
`environment`	`workspace.yaml` → `environment`, fallback `LUMENFLOW_ENV`	`"prod"`
`snapshot_window`	`--days` flag	`"7d"`, `"30d"`
`pipeline`	`CI_PIPELINE_NAME`, fallback `GITHUB_WORKFLOW`	`"main-ci"`
`deploy_target`	`DEPLOY_TARGET`	`"prod-eu"`
`workflow_run_id`	`GITHUB_RUN_ID`, fallback `CI_PIPELINE_ID`	`"987654"`

Lead time and FDRT records additionally carry aggregation: "median", mean_hours, and p90_hours so trend dashboards can plot all three aggregations without re-running the CLI. CFR records carry failures + total_deployments; Deployment Rework Rate carries rework_commits + total_deployments.

Enabling cloud sync

pnpm cloud:connect                        # Interactive OAuth + workspace.yaml scaffolding
pnpm config:get --key control_plane       # Verify endpoint + sync_interval
pnpm metrics:snapshot                     # Emits NDJSON + triggers cloud sync when configured

See Workspace spec for the full control_plane schema.

Flow Analysis

Identifying Bottlenecks

pnpm flow:bottlenecks

This analyzes your WU flow to identify:

Lane Congestion — Lanes exceeding WIP limits
Blocked WUs — Work units waiting on dependencies
Stale WUs — WUs in progress for too long
Critical Path — WUs blocking the most downstream work

Metrics Snapshots

Capture point-in-time metrics for dashboards or CI:

pnpm metrics:snapshot                    # Full snapshot, writes .lumenflow/snapshots/metrics-latest.json
pnpm metrics:snapshot --type dora        # DORA only
pnpm metrics:snapshot --days 30          # 30-day window; value still reported per-week

Telemetry Events

LumenFlow emits structured NDJSON telemetry under .lumenflow/telemetry/:

File	Purpose
`.lumenflow/telemetry/gates.ndjson`	Gate execution events (duration, pass/fail, WU, lane)
`.lumenflow/flow.log`	WU lifecycle events (`wu:claim`, `wu:prep`, `wu:done`)
`.lumenflow/telemetry/dora.ndjson`	DORA metric records with canonical tag bag
`.lumenflow/telemetry/costs.ndjson`	LLM cost events (model, tokens, USD)
`.lumenflow/telemetry/llm-classification.ndjson`	LLM classification lifecycle events

CLI Reference

Command	Description
`pnpm metrics:snapshot`	Capture 5-metric DORA snapshot; emits NDJSON + syncs to cloud
`pnpm flow:report`	Generate DORA + gate + WU flow report
`pnpm flow:bottlenecks`	Identify workflow bottlenecks and critical path

Best Practices

Review metrics weekly

Schedule a weekly review of flow metrics to identify trends before they become problems.
Set WIP limits appropriately

If a lane is consistently at 100%+ capacity, consider splitting the lane, adding capacity, or reducing WU scope.
Address blockers quickly

Blocked WUs create cascading delays. Prioritize unblocking over new work.
Track trends, not absolutes

DORA research emphasises continuous improvement over hitting specific numbers. Watch the slope, not the intercept.

Next Steps

DORA Metrics (concept) — formal definitions, formulas, rationale
Initiatives — multi-phase project coordination
Team Workflow — team practices and conventions
Workspace spec — control_plane configuration