The Illusion of Progress: Why Most Metrics Are Theater, Not Truth

product-strategymetricsengineering-cultureleadershipmeasurementvanity-metrics
Dashboard showing misleading metrics with climbing graphs contrasted against flat business outcomes - illustrating the disconnect between vanity metrics and real impact

A strategic guide for product and technology leaders on distinguishing signal from noise


The Dashboard Delusion

Every quarter, leadership teams gather around polished dashboards filled with climbing curves and color-coded KPIs. Open rates are up 23%. Lines of code shipped increased 40%. Customer engagement metrics show green across the board. Everyone nods approvingly. The business must be thriving.

Except it isn't.

Revenue is flat. Churn is accelerating. The product roadmap is a graveyard of features nobody uses. The disconnect between what gets measured and what actually matters has become the silent killer of modern product organizations.

openrate

According to the above chart, open rates fluctuate wildly between 40% and 80% over a single month, while click rates—the metric that actually indicates user intent—barely register above 10%. This pattern reveals a fundamental truth: the metrics we celebrate most loudly are often the ones that matter least.

This isn't just a technology problem. It's a systemic failure of measurement philosophy that spans industries, from how central banks calculate inflation to how engineering teams justify their existence.


The Anatomy of Vanity Metrics

What Makes a Metric Useless?

A vanity metric shares three characteristics:

  1. It moves independently of business outcomes — You can optimize it without improving revenue, retention, or competitive position
  2. It's easily manipulated without creating value — Teams can game it through tactical tricks that don't serve customers
  3. It lacks causal connection to decision-making — Knowing the number doesn't tell you what to do next

The email open rate exemplifies all three.

Consider the email data: open rates spike to 80%+ on certain days. What does that tell a product leader? Almost nothing. Were those emails opened by engaged users or bots? Did recipients immediately delete them? Did the content drive any meaningful action? The metric itself is silent on what matters.

Meanwhile, click rates—which indicate actual user interest and intent—remain stubbornly low. This is the metric that correlates with conversion, with engagement, with revenue. Yet most email marketing reports lead with open rates because they look better in presentations.

As Eric Ries articulated in The Lean Startup, vanity metrics are "the numbers you want to publish on TechCrunch to make your company look good." Actionable metrics, by contrast, are "the numbers that actually help you make decisions."

The Technical Equivalent: Lines of Code

In engineering organizations, lines of code (LOC) serves the same theatrical purpose as email open rates.

A team ships 50,000 lines of code in a quarter. Is that good? It depends entirely on what those lines accomplish:

  • Did they enable a new revenue stream?
  • Did they reduce system complexity or increase it?
  • Did they improve performance, reliability, or user experience?
  • Or did they simply add technical debt that will slow future development?

LOC is a volume metric masquerading as a value metric. It measures activity, not impact. Yet engineering leaders continue to report it because it's easy to calculate and sounds impressive to non-technical stakeholders.

The better question is never "How much code did we write?" but rather "What outcomes did our engineering work enable?"

Stripe's engineering culture, as documented by Patrick Collison and others, explicitly rejects LOC-based thinking in favor of outcome-oriented metrics like deployment frequency, time-to-resolution for critical bugs, and feature adoption rates. These metrics connect technical work directly to business results.


Beyond Technology: The Measurement Crisis Everywhere

Consumer Price Index: Measuring the Wrong Inflation

The metrics problem extends far beyond product and engineering. Consider how governments measure inflation through the Consumer Price Index (CPI).

CPI tracks a "basket" of goods and services, weighted by how much the average household supposedly spends on each category. But the methodology contains several critical flaws:

  1. Substitution bias — When beef prices rise, CPI assumes consumers switch to chicken, thus understating the real cost increase
  2. Hedonic adjustment — A new laptop with better specs is treated as "more value for the same price," even if consumers still pay more
  3. Housing weight distortion — Housing costs, the largest expense for most households, are measured through "owner's equivalent rent" rather than actual home prices or mortgage costs

The result? Official inflation figures often diverge dramatically from what people actually experience. According to research by the Billion Prices Project at MIT, real-time price tracking across online retailers frequently shows inflation rates 2-3 percentage points higher than official CPI during volatile periods.

For policymakers, this matters enormously. Interest rate decisions, wage negotiations, and social security adjustments all hinge on CPI. When the metric fails to capture reality, every downstream decision becomes flawed.

The parallel to product metrics is exact: when leadership optimizes for the wrong measurements, every strategic choice becomes suspect.

Hospital Readmission Rates: Gaming Healthcare Metrics

Healthcare provides another cautionary tale. In 2012, the Affordable Care Act introduced financial penalties for hospitals with high 30-day readmission rates, intending to improve care quality.

The metric seemed logical: if patients return to the hospital shortly after discharge, something must have gone wrong. Lower readmissions must mean better care.

Except hospitals quickly learned to game the system. Instead of improving care, many simply:

  • Held patients in "observation status" rather than formally admitting them, so their return wouldn't count as a readmission
  • Diverted returning patients to emergency departments for extended observation
  • Discharged patients to skilled nursing facilities to restart the 30-day clock

A 2019 study in JAMA Internal Medicine found that while readmission rates declined, overall mortality rates increased—suggesting that hospitals were avoiding readmissions at the expense of actual patient outcomes.

The metric became the goal, replacing the actual goal of better patient health.


The Intent Behind the Metric: What Are We Really Trying to Learn?

Every useful metric begins with a question worth answering. Vanity metrics emerge when organizations lose sight of the question and simply measure what's convenient.

The Framework: Outcome → Behavior → Proxy

Start with the outcome you care about:

  • Revenue growth
  • Customer retention
  • Competitive differentiation
  • Market expansion

Identify the behaviors that drive that outcome:

  • Users completing core workflows
  • Customers referring others
  • Teams shipping features that get adopted
  • Systems handling scale without degradation

Choose proxies that reliably indicate those behaviors:

  • Feature engagement rates (not just page views)
  • Net revenue retention (not just gross signups)
  • Deployment frequency paired with error rates (not just velocity)
  • Customer effort scores (not just satisfaction surveys)

The uploaded chart demonstrates this distinction perfectly. Open rates are a proxy for email delivery and subject line effectiveness—useful for diagnosing technical issues, but disconnected from business outcomes. Click rates, while imperfect, are at least one step closer to measuring actual user interest and intent.

But even click rates don't tell the full story. What matters is what happens after the click: Did the user complete a purchase? Sign up for a trial? Share the content? These downstream outcomes are harder to measure but infinitely more valuable.

Goodhart's Law: When Metrics Become Targets

British economist Charles Goodhart observed: "When a measure becomes a target, it ceases to be a good measure."

This happens because humans are extraordinarily good at optimizing for whatever gets measured, regardless of whether that optimization creates real value.

Examples abound:

  • Teachers "teaching to the test" rather than fostering genuine learning
  • Sales teams closing bad-fit customers to hit quarterly quotas
  • Engineers splitting tickets to inflate velocity scores
  • Marketers buying email lists to boost subscriber counts

The solution isn't to stop measuring—it's to measure what's harder to fake. Revenue is harder to fake than signups. Customer retention is harder to fake than acquisition. Code that ships to production and gets used is harder to fake than code that passes code review.

Amazon's approach to metrics, as described in Colin Bryar and Bill Carr's Working Backwards, emphasizes "controllable input metrics" over "output metrics." Rather than obsessing over revenue (an output), teams focus on metrics they can directly influence through product decisions—like selection breadth, delivery speed, and price competitiveness—that have proven causal relationships to revenue.


How Metrics Are Computed: The Hidden Assumptions

Most metrics contain invisible assumptions that determine their usefulness. Understanding how a metric is calculated reveals what it actually measures versus what it claims to measure.

Email Open Rates: A Technical Mirage

Email open rates are calculated by embedding a tiny invisible image (a tracking pixel) in the email. When the recipient's email client loads that image, it's counted as an "open."

But this methodology has massive blind spots:

  • Apple Mail Privacy Protection (introduced in 2021) pre-loads all images on Apple's servers, registering an "open" even if the user never looks at the email
  • Gmail image proxying caches images, making it impossible to distinguish multiple opens from a single open
  • Text-only email clients never load images, undercounting engaged users who prefer plain text
  • Bots and spam filters trigger opens during automated scanning

According to Litmus Email Analytics, Apple's privacy changes alone inflated average open rates by 10-15 percentage points industry-wide. The metric didn't change—its underlying meaning did.

This is why the uploaded chart's wild open rate fluctuations tell us almost nothing. Are the spikes real engagement or Apple's pre-fetching? We can't know from the metric alone.

Lines of Code: The Complexity Trap

LOC suffers from similar computational problems. How should it be counted?

  • Do comments count?
  • Do blank lines count?
  • Does generated code count?
  • Does deleting code count as negative LOC or not at all?
  • How do you compare 100 lines of Python to 100 lines of assembly?

More fundamentally: code is a liability, not an asset. Every line added increases maintenance burden, expands the attack surface, and creates opportunities for bugs. The best engineers often ship negative lines of code by simplifying systems.

John Carmack, legendary game developer and CTO of Oculus, once noted: "The best code is no code at all. Every line you write is a line you have to debug, a line you have to maintain, and a line that can break."

Yet most engineering metrics reward code creation, not code elimination.

Inflation Measurement: The Substitution Illusion

CPI's substitution bias reveals how computational choices embed philosophical assumptions.

When steak prices rise 20% and chicken prices rise only 5%, CPI assumes consumers substitute chicken for steak, calculating inflation somewhere between the two. This assumes:

  1. Consumers are perfectly rational economic actors
  2. Steak and chicken are functionally equivalent
  3. Quality of life isn't diminished by forced substitution

All three assumptions are questionable. Yet they're baked into the metric's calculation, invisible to anyone just reading the final inflation number.

The same happens with product metrics. A "daily active user" might be defined as "anyone who opens the app," but that definition assumes opening the app equals meaningful engagement—an assumption that may not hold for apps people open accidentally or out of habit without actually using features.


Building Better Measurement Systems

Principle 1: Measure Outcomes, Not Outputs

Outputs are activities. Outcomes are results that matter to the business.

Bad: Number of features shipped
Good: Percentage of features that achieve adoption targets

Bad: Customer support tickets closed
Good: Customer effort score and resolution satisfaction

Bad: Code coverage percentage
Good: Production incident frequency and mean time to recovery

Amplitude CEO Spenser Skates describes this as the shift from "vanity metrics to North Star metrics"—singular outcome-oriented measures that align the entire organization around what actually drives growth.

Principle 2: Pair Leading and Lagging Indicators

Lagging indicators tell you what happened. Leading indicators tell you what's about to happen.

Revenue is a lagging indicator—by the time it moves, the underlying business reality has already shifted. Feature adoption, trial-to-paid conversion, and customer engagement are leading indicators that predict future revenue.

The best measurement systems pair both:

  • Lagging: Monthly recurring revenue
  • Leading: Trial activation rate, feature adoption depth, expansion revenue pipeline

This pairing allows teams to spot problems early while still tracking ultimate outcomes.

Principle 3: Make Metrics Falsifiable

A good metric should be specific enough that it can be proven wrong. "User engagement is increasing" is unfalsifiable—you can always redefine engagement to make it true. "75% of new users complete the core workflow within 7 days" is falsifiable and therefore useful.

Karl Popper's philosophy of science applies directly to metrics: a hypothesis that can't be proven false isn't scientific. A metric that can't be proven wrong isn't actionable.

Principle 4: Instrument the Full Funnel

Vanity metrics often measure the top of the funnel while ignoring what happens downstream.

Email open rates measure whether someone saw your subject line. But the full funnel includes:

  • Open rate (did they see it?)
  • Click rate (did they engage?)
  • Conversion rate (did they take the desired action?)
  • Retention rate (did they come back?)

The uploaded chart shows exactly this problem: high open rates with abysmal click rates signal a disconnect between getting attention and earning engagement.

Similarly, engineering velocity means nothing without:

  • Deployment success rate
  • Feature adoption rate
  • System reliability metrics
  • Customer satisfaction impact

Measure the whole system, not just the convenient part.


Real-World Case Studies

Netflix: From Viewing Hours to Retention

Netflix famously shifted from measuring "viewing hours" (a vanity metric) to "retention rate" (an outcome metric).

Viewing hours could be gamed by auto-playing content or by acquiring users who binged one show then canceled. Retention, by contrast, directly correlates with lifetime value and business sustainability.

This shift changed everything about how Netflix built product. Features weren't evaluated on whether they increased viewing hours, but whether they improved retention—a much harder bar to clear and a much more honest measure of value creation.

Stripe: API Error Rates Over Feature Count

Stripe's engineering culture obsesses over API reliability and developer experience metrics rather than feature velocity.

Their key metrics include:

  • API success rate (uptime isn't enough—requests must succeed)
  • Time to first successful API call for new developers
  • Payment acceptance rate across different countries and payment methods

These metrics directly connect to business outcomes: more reliable APIs mean more transactions processed, which means more revenue. Feature count, by contrast, is irrelevant if the core API doesn't work flawlessly.

Superhuman: Product-Market Fit Score

Superhuman, the email client, famously developed a quantitative approach to measuring product-market fit based on Sean Ellis's question: "How would you feel if you could no longer use this product?"

They measure the percentage of users who answer "very disappointed" and use 40% as the threshold for product-market fit. This metric is:

  • Outcome-oriented: It predicts retention and word-of-mouth growth
  • Falsifiable: You can miss the threshold
  • Actionable: You can survey specific user segments to understand what drives disappointment vs. delight

It's the opposite of a vanity metric—it's designed to reveal hard truths, not to make the team feel good.


The Organizational Cost of Bad Metrics

Vanity metrics don't just waste time—they actively damage organizations by:

1. Misallocating Resources

When teams optimize for the wrong metrics, they invest in work that doesn't matter. Engineers build features nobody wants. Marketers chase channels that don't convert. Product teams prioritize the loud minority over the valuable majority.

2. Creating Perverse Incentives

What gets measured gets managed—and gamed. When metrics become targets, teams find creative ways to hit the numbers without creating value. The result is organizational theater: everyone performing success while the business stagnates.

3. Obscuring Real Problems

Vanity metrics provide false comfort. Leadership believes things are improving because the dashboard is green, while underlying fundamentals deteriorate. By the time the crisis becomes obvious, it's often too late to course-correct.

4. Eroding Trust

When teams know the metrics don't reflect reality, they lose faith in leadership's judgment. Cynicism spreads. The best people leave. Those who remain learn to play the game rather than do the work.


Actionable Takeaways for Product and Technology Leaders

1. Audit Your Metrics Against Outcomes

For every metric your organization tracks, ask:

  • What business outcome does this predict or measure?
  • Can we improve this metric without improving that outcome?
  • If so, it's probably a vanity metric.

2. Implement Metric Hierarchies

Structure your measurement system in layers:

  • North Star Metric: The single outcome that matters most (e.g., revenue retention, daily active engaged users)
  • Input Metrics: The 3-5 behaviors that drive the North Star
  • Diagnostic Metrics: The operational indicators that help you debug problems

Everything else is noise.

3. Pair Metrics With Counterfactuals

Never report a metric without its complement:

  • Open rates with click rates
  • Velocity with quality metrics
  • Growth with retention
  • Revenue with customer acquisition cost

This prevents gaming and reveals the full picture.

4. Make Metrics Reviewable and Challengeable

Create forums where teams can question metrics:

  • How is this calculated?
  • What assumptions does it embed?
  • What are we missing?
  • Is this still the right thing to measure?

Metrics should evolve as the business evolves.

5. Celebrate Metric Deprecation

When a team proposes removing a metric because it no longer drives decisions, celebrate it. Metric proliferation is as dangerous as metric poverty. The best measurement systems are ruthlessly minimal.


Conclusion: Measure What Matters, Ignore the Rest

The uploaded chart of email metrics is a perfect metaphor for modern product and technology organizations: lots of movement, impressive peaks, and very little signal about what actually drives business outcomes.

The solution isn't more metrics. It's better metrics—measurements that connect directly to outcomes, that resist gaming, that inform decisions rather than just decorating dashboards.

This requires discipline. It requires saying no to metrics that make us feel good but don't make us better. It requires asking hard questions about what we're really trying to learn and whether our measurements actually answer those questions.

Most importantly, it requires humility about what we can and cannot know. Not everything that matters can be measured, and not everything that can be measured matters.

The best product and technology leaders distinguish between the two.


Further Reading & Exploration Topics

Core Concepts:

  • Goodhart's Law and Campbell's Law in organizational contexts
  • The difference between correlation and causation in product analytics
  • Survivorship bias in metric interpretation

Measurement Philosophy:

  • How quantum mechanics' observer effect applies to business metrics
  • The role of qualitative research alongside quantitative metrics
  • Building "instrumentation debt" alongside technical debt

Industry-Specific Applications:

  • SaaS metrics: from MRR to net revenue retention
  • Marketplace metrics: balancing supply and demand side indicators
  • Platform metrics: measuring network effects and ecosystem health

Organizational Design:

  • How metric choice shapes team behavior and culture
  • Building data literacy across non-technical leadership
  • The role of data teams in challenging metric assumptions

If any of this resonates, you should subscribe.

No spam. No fluff. Just honest reflections on building products, leading teams, and staying curious.