When AI Lifts All Boats: Rethinking How We Evaluate Talent

The performance review season just got a lot more complicated.

For decades, product and engineering leaders evaluated talent using relatively straightforward proxies: lines of code shipped, features delivered, tickets closed, designs produced. Output was measurable. Productivity had clear signals. The best performers simply did more.

Then GitHub Copilot started writing 40% of code. GPT-4 began drafting product specs in minutes. Midjourney turned every PM into a visual designer. Suddenly, everyone's output metrics went up—sometimes dramatically.

The problem? When AI raises the floor for everyone, traditional performance indicators lose their signal. The engineer who used to ship 10 features now ships 15. But so does everyone else. The PM who once took days to write a PRD now does it in hours. But that's table stakes.

For CPOs and CTOs navigating 2024 and beyond, the question isn't whether AI will change how teams work—it already has. The real question is: How do you identify and develop exceptional talent when AI has commoditized raw output?

The answer requires a fundamental shift in what we measure, value, and reward.

The Collapse of Output as Signal

Paul Graham wrote in "Founder Mode" that the best founders don't delegate—they stay in the details. But there's a corollary for talent evaluation: the best builders don't just produce—they decide what to produce.

AI has created what Ethan Mollick at Wharton calls "the jagged frontier" of capability—some tasks AI handles brilliantly, others it fails at completely. The challenge for leaders is that AI's strengths (generation, iteration, synthesis) happen to overlap heavily with what we traditionally measured as "productivity."

Consider three scenarios:

Engineer A uses Copilot to write boilerplate code 3x faster. They close more tickets than anyone on the team.

Engineer B uses Copilot the same way, but also identifies that the team is solving the wrong problem—the feature they're building won't actually address the customer pain point. They propose a simpler architecture that eliminates the need for 60% of planned work.

Engineer C barely uses AI tools. They write less code, but the code they write becomes the foundation other engineers build on for years.

Traditional metrics favor Engineer A. Reality favors B and C.

This is the measurement crisis AI has created: volume is no longer a proxy for value.

The New Dimensions of Exceptional Talent

If output is democratized, what separates exceptional from average? Research and real-world evidence point to four critical dimensions:

1. Judgment Under Ambiguity

Ben Thompson of Stratechery has long argued that "aggregation theory" explains why platforms win—they own demand and commoditize supply. AI is now doing the same thing to execution. It's commoditizing the supply of code, copy, designs, and analysis.

What AI cannot commoditize is judgment about what to build and why.

The best product people have always been excellent at:

Identifying which customer problems actually matter
Distinguishing between symptoms and root causes
Knowing when to say no to feature requests
Understanding second-order effects of product decisions

AI can generate a dozen product solutions. It cannot tell you which one will create lasting value or which will create technical debt that haunts you for years.

Evaluation shift: Stop measuring how many specs someone writes. Start measuring:

How often their product bets pay off over 6-12 month horizons
Whether they can articulate why they're making specific tradeoffs
Their ability to update their mental models when new data arrives
How well they distinguish between "customer said they want X" and "customer actually needs Y"

Shreyas Doshi, former PM leader at Stripe, Twitter, and Google, frames this as "LNO Framework" thinking—the ability to distinguish Leverage (high-impact), Neutral (maintenance), and Overhead (low-value) work. AI makes executing all three faster. Great PMs know which category they're in.

2. Initiative and Problem-Finding

Most organizations are decent at problem-solving. They're terrible at problem-finding.

AI supercharges problem-solving. Give it a well-defined problem, and it will generate solutions, code implementations, test cases, documentation. But AI doesn't walk the floor, notice the customer support team is drowning in the same question, and connect that to a product gap.

Great talent in the AI era actively hunts for problems worth solving.

This mirrors what Andy Grove described in High Output Management as "task-relevant maturity"—the best people don't wait to be told what to do. They have what Grove called "a feel for when to act."

Consider how Figma's Dylan Field described their early product development: "We didn't ask designers what features they wanted. We watched where they struggled and built solutions for problems they didn't know they had."

That's initiative. That's problem-finding. AI doesn't do that.

Evaluation shift: Track:

How often someone surfaces problems before they become crises
Whether they bring solutions, not just complaints
Their ability to connect dots across teams or customer segments
How frequently they identify opportunities outside their immediate scope

The engineer who notices the API design will cause scaling issues in six months is more valuable than the one who writes elegant code for a fundamentally flawed architecture—even if the latter "ships more."

3. Adaptability and Learning Velocity

In "The Innovator's Dilemma," Clayton Christensen showed how successful companies fail by optimizing for existing markets while missing disruptive shifts. The same dynamic applies to individuals.

AI is creating continuous disruption in how work gets done. The tools change every quarter. Best practices from 2023 are outdated in 2024. Workflows that seemed permanent are suddenly obsolete.

The most valuable people are the fastest learners.

This isn't about technical skills alone. It's about:

Comfort with ambiguity and change
Willingness to abandon approaches that worked yesterday
Ability to synthesize new information quickly
Speed of mental model updates

Satya Nadella famously shifted Microsoft's culture from "know-it-alls" to "learn-it-alls." That shift is even more critical now. The half-life of any specific skill is shrinking. The ability to acquire new skills is appreciating.

Evaluation shift: Measure:

How quickly someone becomes productive with new tools or domains
Their willingness to experiment with new approaches
Whether they update their methods based on feedback
How they respond when their expertise becomes obsolete

The PM who learns to prompt-engineer effectively and integrates AI into their workflow is more valuable than the one who resists because "that's not how we've always done it"—even if the latter has more years of experience.

4. Customer Impact and Outcome Orientation

Here's a truth that makes many builders uncomfortable: customers don't care how you built something. They care whether it solves their problem.

AI makes it easier than ever to build something. It doesn't make it easier to build the right something.

Des Traynor, co-founder of Intercom, has written extensively about how AI changes product strategy. His key insight: "AI doesn't change what customers want. It changes what's possible to deliver."

The best product and engineering talent obsess over outcomes, not outputs. They:

Instrument everything to understand actual usage
Talk to customers constantly
Kill their own features when data shows they're not working
Focus on problems, not solutions

AI can generate features. It cannot tell you if those features matter.

Evaluation shift: Focus on:

Measurable customer outcomes (retention, satisfaction, behavior change)
Ability to articulate customer problems in depth
Willingness to kill work that isn't driving impact
Connection between their work and business metrics

The designer who ships fewer screens but dramatically improves conversion is more valuable than the one who ships beautiful designs that users ignore—even if the latter has a more impressive portfolio.

Practical Frameworks for Evaluation

Theory is useful. Implementation is hard. Here are concrete approaches for CPOs and CTOs:

The "AI-Adjusted Performance Review"

Traditional performance reviews measure:

Volume of work
Speed of delivery
Technical complexity

AI-era reviews should measure:

Decision quality: What percentage of their product/technical decisions proved correct over time?
Problem identification: How many important problems did they surface that weren't on anyone's radar?
Leverage: What's the ratio of impact to effort in their work?
Learning velocity: How quickly did they adapt to new tools, domains, or requirements?
Outcome ownership: What measurable customer or business outcomes improved because of their work?

The "Force Multiplier" Test

Ask: If this person left tomorrow, what would break that AI couldn't fix?

Would we lose institutional knowledge about why we built things certain ways?
Would we lose relationships with key customers or partners?
Would we lose the ability to make hard tradeoff decisions?
Would we lose someone who unblocks others and raises the team's performance?

The best talent are force multipliers. AI is a force multiplier too, but it multiplies what already exists. Exceptional people multiply the team's capability in ways AI cannot.

The "Complexity Navigator" Assessment

Evaluate how people handle increasing complexity:

Level 1: Can execute well-defined tasks (AI can do this) Level 2: Can solve defined problems independently (AI is getting good at this) Level 3: Can identify which problems to solve (AI struggles here) Level 4: Can navigate ambiguous situations with competing priorities (AI cannot do this) Level 5: Can reshape how the organization thinks about problems (AI definitely cannot do this)

Most talent evaluation has focused on Levels 1-2. The future requires focusing on Levels 3-5.

The "Judgment Audit"

Track decisions over time:

What did this person recommend 6 months ago?
What actually happened?
What did they learn?
How did they update their approach?

Create a "decision log" culture where important choices are documented with reasoning. Review them quarterly. The people whose judgment consistently proves sound are your most valuable assets—regardless of how much they "ship."

Building Systems That Surface the Right Signal

Measurement doesn't exist in a vacuum. The systems you build shape what people optimize for.

1. Rewrite Your Job Descriptions

Most engineering and product job descriptions still emphasize:

"Ship features quickly"
"Write clean code"
"Deliver on commitments"

These are table stakes now. Rewrite them to emphasize:

"Identify high-leverage problems"
"Make sound decisions under uncertainty"
"Drive measurable customer outcomes"
"Adapt quickly to changing contexts"

2. Change Your Interview Process

Stop asking:

"How would you build X?" (AI can answer this)
"What's your technical background?" (becoming less predictive)

Start asking:

"Tell me about a time you killed your own work because data showed it wasn't working"
"Describe a situation where you identified a problem no one else saw"
"How do you decide what not to build?"
"Walk me through how you've changed your approach to [domain] over the past year"

Lenny Rachitsky's interview with Elena Verna highlights how top growth leaders evaluate talent: "I don't care about their resume. I care about how they think about problems."

3. Implement "Impact Reviews" Not "Performance Reviews"

Shift from:

Quarterly reviews of output metrics
Ratings based on volume of work
Comparisons to peers on productivity

To:

Continuous feedback on decision quality
Retrospectives on outcomes vs. predictions
Assessments of learning and adaptation
Customer impact as the primary metric

Spotify's "Squad Health Check" model offers a useful template—it measures team health across multiple dimensions, not just velocity.

4. Create "Judgment Development" Programs

If judgment is the new scarce resource, invest in developing it:

Run regular "decision autopsies" where teams review past choices
Create mentorship programs pairing junior talent with those who have strong track records
Build case study libraries of product and technical decisions
Encourage writing and documentation of reasoning, not just conclusions

Amazon's "six-page memo" culture is valuable precisely because it forces clarity of thinking. AI can draft the memo. It cannot provide the judgment that goes into it.

The Cultural Shift Required

None of this works without cultural change. Organizations must shift from:

Rewarding "busy-ness" → Rewarding impact Valuing speed → Valuing direction Celebrating output → Celebrating outcomes Promoting based on seniority → Promoting based on judgment

This is uncomfortable. It requires:

Letting go of comfortable metrics
Accepting that some valuable work is invisible
Trusting people to define their own high-impact work
Being willing to say "you shipped a lot, but it didn't matter"

Stripe's Patrick Collison and Tyler Cowen have written about the need for "progress studies"—understanding what actually drives innovation. The same thinking applies internally: What actually drives customer value? What actually makes teams effective?

The answers are rarely the things that are easiest to measure.

What This Means for Career Development

For individual contributors and managers navigating this shift:

If you're early in your career:

Use AI aggressively to accelerate learning, not to avoid it
Focus on building judgment through rapid experimentation
Seek projects with ambiguous requirements
Learn to articulate why you're making decisions, not just what you're building

If you're mid-career:

Shift from execution to decision-making
Develop expertise in problem identification, not just problem-solving
Build relationships with customers and stakeholders
Become the person others come to for judgment calls

If you're a leader:

Model the behavior you want to see
Protect time for strategic thinking
Build systems that reward the right things
Be willing to promote based on judgment, not just output

The Paradox of AI and Human Value

Here's the paradox: AI makes human judgment more valuable by making everything else cheaper.

When execution is abundant, strategy becomes scarce. When code is commoditized, architecture becomes critical. When everyone can ship features, knowing which features to ship becomes the bottleneck.

This mirrors what happened in previous technology shifts. When manufacturing became automated, design became more valuable. When information became freely available, curation became more valuable. When distribution became democratized, taste became more valuable.

AI is doing to knowledge work what automation did to manufacturing: raising the floor and widening the gap between good and great.

The CPOs and CTOs who understand this will build better products with better teams. Those who don't will wonder why their increasingly "productive" teams aren't actually winning in the market.

Key Takeaways

Output metrics are losing signal fast. Lines of code, features shipped, and tickets closed are poor proxies for value when AI lifts everyone's productivity.
Judgment is the new scarce resource. The ability to decide what to build, identify important problems, and make sound tradeoffs cannot be automated.
Initiative and problem-finding separate good from great. AI solves problems you give it. Exceptional talent finds problems worth solving.
Adaptability matters more than expertise. The half-life of specific skills is shrinking. Learning velocity is appreciating.
Customer outcomes are the ultimate metric. Measure impact, not activity. The best builders obsess over whether their work actually matters to customers.
Your evaluation systems shape behavior. If you measure output, people will optimize for output. If you measure judgment and impact, they'll optimize for that instead.
This requires cultural change, not just process change. You can't bolt new metrics onto old cultures and expect transformation.

If any of this resonates, you should subscribe.