Loading
Loading
Every CTO makes strategic decisions against a mental model of what good looks like. The 2.8× delivery improvement we measure is our anchor number. This post is about how to find yours.
Author
Tom Bergström
Published
21 May 2026
Reading time
9 min read
Topics
nordic-tech, enterprise, scaling
Every CTO operates against an anchor number — a specific metric that has become their shorthand for whether their engineering organisation is working. Some track deployment frequency. Some track mean time to recovery. Some track cycle time. When we built the AI Code Factory, we started measuring delivery speed against a pre-methodology baseline. The number we landed on was 2.8×. That's the floor of improvement we've measured across engagements. This post is about what that number means, how we measure it, and how every CTO can find the equivalent anchor number for their own organisation.
2.8×
Faster delivery — measured floor, not ceiling
Measured across 50+ AI Code Factory engagements. Expecting 3.5–4× within 6 months as the methodology matures.
The 2.8× figure is the ratio of delivery speed — measured in story points completed per developer-week — between an AI Code Factory team running our full methodology (agents, SKILL.md files, hooks, guardrails) and the baseline for the same team or an equivalent team running traditional development practices. It's not a comparison to a struggling team. It's a comparison to a competent team doing standard modern development — GitHub Copilot used ad hoc, code review done manually, coverage enforced at CI but not as a blocking gate, no structured knowledge management.
The 2.8× is a floor because we measure it conservatively. We take the first eight weeks of an engagement — when the SKILL.md library is thin and the team is still calibrating the agents — and use that as the baseline. As the library matures and the agents accumulate codebase-specific knowledge, the ratio increases. Our current observation across mature engagements (six months or more) is 3.2–3.8×. We expect the floor to shift upward as we continue improving the methodology. We call it 2.8× publicly because it's what we can stand behind from sprint one.
"2.8× is the number we use because it's what we can demonstrate from the first sprint. We don't ask clients to trust a projection. We show them the data at week eight and the number is sitting at 2.8× or above. That's a different conversation than promising a number." — Pavel Siddique, CEO, Indpro AB
The measurement protocol: before an engagement starts, we ask for 90 days of sprint data from the client's existing team — story points completed per sprint, team size, and test coverage. We normalize for team size to get a per-developer-week velocity figure. We call this the baseline. We then measure the AI Code Factory team's velocity in the same units across the engagement. The 2.8× is the ratio between the engagement team's sprint-8 velocity and the client's 90-day pre-engagement baseline, normalized per developer.
There are limitations in this measurement. Story points are not a perfect unit — they encode complexity estimates that vary by team and change over time. We cross-validate using a second metric: feature deployment frequency (features deployed to production per week). This typically shows a similar ratio — sometimes higher, sometimes slightly lower, but consistently above 2× from sprint four onwards. Where clients allow it, we also count post-merge defect rates and time-to-production for a defined feature scope. These secondary metrics corroborate the story point ratio and give clients multiple views of the same improvement.
The 2.8× figure is useful for Indpro because it's the output of our specific methodology applied consistently across engagements. It's what we can commit to delivering. But for your organisation, the anchor number should be yours — derived from your baseline, your team, your codebase complexity. A CTO who knows their current velocity per developer-week, their review cycle count, their unplanned work percentage, and their deployment frequency has four numbers that describe their system's health. They can track whether decisions they make move those numbers. That's a different kind of management than operating on intuition.
The CTOs we see make the sharpest decisions are the ones who have committed to three to five specific numbers as their engineering health metrics. Not 20 metrics on a dashboard nobody reads — three to five that they personally review before making a headcount, tooling, or process decision. When the numbers move in the wrong direction, they investigate before assuming a talent explanation. When the numbers move in the right direction after a change, they know the change worked. This sounds obvious. It's surprisingly rare in practice.
Deployment frequency and MTTR (mean time to recovery) are well-known DORA metrics. They're valuable. But in our experience, the metrics with the most diagnostic value are less commonly tracked.
| Metric | What It Reveals | Healthy Range | Problem Signal |
|---|---|---|---|
| Review cycles per PR | Guardrails and standards quality | 1.2–1.8 | Above 2.5 |
| Test coverage trend | Technical debt trajectory | Stable or rising, above 80% | Declining, below 75% |
| Unplanned work % | Production stability, interruption cost | Below 20% | Above 30% |
| Post-merge defects (7d) | Real quality gate effectiveness | Below 3/sprint | Above 8/sprint |
| Days from spec to first PR | Scaffolding and setup efficiency | Below 2 days | Above 4 days |
| Onboarding days to first commit | Documentation and knowledge quality | Below 14 days | Above 21 days |
These six metrics, pulled from 90 days of data, give a complete picture of whether a delivery system is functioning or struggling. They're all available from existing tooling — GitHub PR data covers the first four, Jira or Linear covers unplanned work, and your HR system covers onboarding. You don't need new tooling. You need to query the tools you have with intention.
What's your current review cycle count? Your coverage trend? If you don't know them off the top of your head, that's useful information. We can help you pull the 90-day diagnostic and benchmark it against these ranges.
Book a Diagnostic ReviewDownload the Nordic CTO GuideWe didn't start with a target of 2.8× and build a methodology to hit it. We started with a problem — ad-hoc Copilot usage was producing inconsistent quality, review cycles were long, and the productivity promise of AI-assisted development wasn't showing up in delivery speed — and built the AI Code Factory to solve that problem. SKILL.md files came first, because structured knowledge feeding agents was the missing element. Hooks came second, because deterministic enforcement at commit time was more reliable than advisory reminders. Guardrails came third, because coverage and lint enforcement needed to be blocking, not aspirational. Agents were configured last, because they needed the infrastructure to be useful.
After 50+ engagements, the methodology produces 2.8× as a consistent floor. The number is the result of the system, not the goal. If you build the system — structured knowledge, deterministic hooks, enforced guardrails, calibrated agents — the number follows. Chasing the number directly, without building the system, produces gaming of the metric. That's the pattern we see in companies that adopt AI tools without methodology: local improvements in specific metrics, no change in delivery speed, and often a deterioration in quality.
The question to take to your next leadership meeting: What are our three anchor metrics for engineering health, what are our thresholds for each, and when did we last make a decision based on one of them crossing a threshold? If the answer is uncertain, start there.
We expect the floor to shift. As the SKILL.md library across 50+ engagements continues to mature, as agent calibration improves with more codebase exposure, and as the hook and guardrails infrastructure gets more sophisticated, we expect the measured floor to reach 3.5× within six months. The ceiling for AI-assisted development — the upper bound of what's achievable with the right system, the right codebase, and the right team — is likely higher still. We don't claim a ceiling because we haven't found one yet. What we can commit to, from sprint one, is 2.8× as the floor.
That's the number we're prepared to be held to. What's yours?
Is the 2.8× measured per developer or for the whole team?
Per developer-week, normalised for team size. This matters because many AI Code Factory engagements use smaller teams (four engineers vs. 11 in a traditional team). The per-developer normalisation ensures the comparison is fair and the 2.8× isn't an artefact of team composition.
Has the 2.8× been independently verified?
It hasn't been independently audited by a third party. It's based on Indpro's own data across 50+ client engagements, using the measurement protocol described in this post. We share the underlying data with clients at the end of each engagement. If you'd like to see raw sprint data from a comparable engagement, we can arrange that under NDA.
Does 2.8× apply to all types of engineering work?
It's strongest for feature development work that involves significant scaffolding, boilerplate, and testing. It's weaker for deep architecture decisions, complex debugging, or novel algorithm development — work where human judgment and domain expertise are the primary input and AI scaffolding adds limited value. The AI Code Factory is not a blanket multiplier; it applies to the class of work where structure and automation reduce friction. For many engineering teams, that's 60–70% of total work.

CTO & Co-Founder
Tom leads Indpro's technology strategy and engineering standards. With 20+ years of experience building and leading engineering teams across the Nordic region, he ensures every engagement delivers at the highest technical level.
Connect on LinkedIn →The average time to hire a senior data engineer in Stockholm is 187 days. Indpro delivers a running team in 12 days. Here's what that difference actually costs and how the model works.
10 pages of practical insight on operating models, compensation benchmarks, and a hiring playbook. Free PDF.
Download the Free GuideOr reach us directly: sales@indpro.se · +46 73 932 21 38