When AI Lifts All Boats: Rethinking How We Evaluate Talent
The performance review season just got a lot more complicated.
For decades, product and engineering leaders evaluated talent using relatively straightforward proxies: lines of code shipped, features delivered, tickets closed, designs produced. Output was measurable. Productivity had clear signals. The best performers simply did more.
Then GitHub Copilot started writing 40% of code. GPT-4 began drafting product specs in minutes. Midjourney turned every PM into a visual designer. Suddenly, everyone's output metrics went up—sometimes dramatically.
The problem? When AI raises the floor for everyone, traditional performance indicators lose their signal. The engineer who used to ship 10 features now ships 15. But so does everyone else. The PM who once took days to write a PRD now does it in hours. But that's table stakes.
For CPOs and CTOs navigating 2024 and beyond, the question isn't whether AI will change how teams work—it already has. The real question is: How do you identify and develop exceptional talent when AI has commoditized raw output?
The answer requires a fundamental shift in what we measure, value, and reward.
The Collapse of Output as Signal
Paul Graham wrote in "Founder Mode" that the best founders don't delegate—they stay in the details. But there's a corollary for talent evaluation: the best builders don't just produce—they decide what to produce.
AI has created what Ethan Mollick at Wharton calls "the jagged frontier" of capability—some tasks AI handles brilliantly, others it fails at completely. The challenge for leaders is that AI's strengths (generation, iteration, synthesis) happen to overlap heavily with what we traditionally measured as "productivity."
Consider three scenarios:
Engineer A uses Copilot to write boilerplate code 3x faster. They close more tickets than anyone on the team.
Engineer B uses Copilot the same way, but also identifies that the team is solving the wrong problem—the feature they're building won't actually address the customer pain point. They propose a simpler architecture that eliminates the need for 60% of planned work.
Engineer C barely uses AI tools. They write less code, but the code they write becomes the foundation other engineers build on for years.
Traditional metrics favor Engineer A. Reality favors B and C.
This is the measurement crisis AI has created: volume is no longer a proxy for value.
The New Dimensions of Exceptional Talent
If output is democratized, what separates exceptional from average? Research and real-world evidence point to four critical dimensions:
1. Judgment Under Ambiguity
Ben Thompson of Stratechery has long argued that "aggregation theory" explains why platforms win—they own demand and commoditize supply. AI is now doing the same thing to execution. It's commoditizing the supply of code, copy, designs, and analysis.
What AI cannot commoditize is judgment about what to build and why.
The best product people have always been excellent at:
- Identifying which customer problems actually matter
- Distinguishing between symptoms and root causes
- Knowing when to say no to feature requests
- Understanding second-order effects of product decisions
AI can generate a dozen product solutions. It cannot tell you which one will create lasting value or which will create technical debt that haunts you for years.
Evaluation shift: Stop measuring how many specs someone writes. Start measuring:
- How often their product bets pay off over 6-12 month horizons
- Whether they can articulate why they're making specific tradeoffs
- Their ability to update their mental models when new data arrives
- How well they distinguish between "customer said they want X" and "customer actually needs Y"
Shreyas Doshi, former PM leader at Stripe, Twitter, and Google, frames this as "LNO Framework" thinking—the ability to distinguish Leverage (high-impact), Neutral (maintenance), and Overhead (low-value) work. AI makes executing all three faster. Great PMs know which category they're in.
2. Initiative and Problem-Finding
Most organizations are decent at problem-solving. They're terrible at problem-finding.
AI supercharges problem-solving. Give it a well-defined problem, and it will generate solutions, code implementations, test cases, documentation. But AI doesn't walk the floor, notice the customer support team is drowning in the same question, and connect that to a product gap.
Great talent in the AI era actively hunts for problems worth solving.
This mirrors what Andy Grove described in High Output Management as "task-relevant maturity"—the best people don't wait to be told what to do. They have what Grove called "a feel for when to act."
Consider how Figma's Dylan Field described their early product development: "We didn't ask designers what features they wanted. We watched where they struggled and built solutions for problems they didn't know they had."
That's initiative. That's problem-finding. AI doesn't do that.
Evaluation shift: Track:
- How often someone surfaces problems before they become crises
- Whether they bring solutions, not just complaints
- Their ability to connect dots across teams or customer segments
- How frequently they identify opportunities outside their immediate scope
The engineer who notices the API design will cause scaling issues in six months is more valuable than the one who writes elegant code for a fundamentally flawed architecture—even if the latter "ships more."
3. Adaptability and Learning Velocity
In "The Innovator's Dilemma," Clayton Christensen showed how successful companies fail by optimizing for existing markets while missing disruptive shifts. The same dynamic applies to individuals.
AI is creating continuous disruption in how work gets done. The tools change every quarter. Best practices from 2023 are outdated in 2024. Workflows that seemed permanent are suddenly obsolete.
The most valuable people are the fastest learners.
This isn't about technical skills alone. It's about:
- Comfort with ambiguity and change
- Willingness to abandon approaches that worked yesterday
- Ability to synthesize new information quickly
- Speed of mental model updates
Satya Nadella famously shifted Microsoft's culture from "know-it-alls" to "learn-it-alls." That shift is even more critical now. The half-life of any specific skill is shrinking. The ability to acquire new skills is appreciating.
Evaluation shift: Measure:
- How quickly someone becomes productive with new tools or domains
- Their willingness to experiment with new approaches
- Whether they update their methods based on feedback
- How they respond when their expertise becomes obsolete
The PM who learns to prompt-engineer effectively and integrates AI into their workflow is more valuable than the one who resists because "that's not how we've always done it"—even if the latter has more years of experience.
4. Customer Impact and Outcome Orientation
Here's a truth that makes many builders uncomfortable: customers don't care how you built something. They care whether it solves their problem.
AI makes it easier than ever to build something. It doesn't make it easier to build the right something.
Des Traynor, co-founder of Intercom, has written extensively about how AI changes product strategy. His key insight: "AI doesn't change what customers want. It changes what's possible to deliver."
The best product and engineering talent obsess over outcomes, not outputs. They:
- Instrument everything to understand actual usage
- Talk to customers constantly
- Kill their own features when data shows they're not working
- Focus on problems, not solutions
AI can generate features. It cannot tell you if those features matter.
Evaluation shift: Focus on:
- Measurable customer outcomes (retention, satisfaction, behavior change)
- Ability to articulate customer problems in depth
- Willingness to kill work that isn't driving impact
- Connection between their work and business metrics
The designer who ships fewer screens but dramatically improves conversion is more valuable than the one who ships beautiful designs that users ignore—even if the latter has a more impressive portfolio.
Practical Frameworks for Evaluation
Theory is useful. Implementation is hard. Here are concrete approaches for CPOs and CTOs:
The "AI-Adjusted Performance Review"
Traditional performance reviews measure:
- Volume of work
- Speed of delivery
- Technical complexity
AI-era reviews should measure:
- Decision quality: What percentage of their product/technical decisions proved correct over time?
- Problem identification: How many important problems did they surface that weren't on anyone's radar?
- Leverage: What's the ratio of impact to effort in their work?
- Learning velocity: How quickly did they adapt to new tools, domains, or requirements?
- Outcome ownership: What measurable customer or business outcomes improved because of their work?
The "Force Multiplier" Test
Ask: If this person left tomorrow, what would break that AI couldn't fix?
- Would we lose institutional knowledge about why we built things certain ways?
- Would we lose relationships with key customers or partners?
- Would we lose the ability to make hard tradeoff decisions?
- Would we lose someone who unblocks others and raises the team's performance?
The best talent are force multipliers. AI is a force multiplier too, but it multiplies what already exists. Exceptional people multiply the team's capability in ways AI cannot.
The "Complexity Navigator" Assessment
Evaluate how people handle increasing complexity:
Level 1: Can execute well-defined tasks (AI can do this) Level 2: Can solve defined problems independently (AI is getting good at this) Level 3: Can identify which problems to solve (AI struggles here) Level 4: Can navigate ambiguous situations with competing priorities (AI cannot do this) Level 5: Can reshape how the organization thinks about problems (AI definitely cannot do this)
Most talent evaluation has focused on Levels 1-2. The future requires focusing on Levels 3-5.
The "Judgment Audit"
Track decisions over time:
- What did this person recommend 6 months ago?
- What actually happened?
- What did they learn?
- How did they update their approach?
Create a "decision log" culture where important choices are documented with reasoning. Review them quarterly. The people whose judgment consistently proves sound are your most valuable assets—regardless of how much they "ship."
Building Systems That Surface the Right Signal
Measurement doesn't exist in a vacuum. The systems you build shape what people optimize for.
1. Rewrite Your Job Descriptions
Most engineering and product job descriptions still emphasize:
- "Ship features quickly"
- "Write clean code"
- "Deliver on commitments"
These are table stakes now. Rewrite them to emphasize:
- "Identify high-leverage problems"
- "Make sound decisions under uncertainty"
- "Drive measurable customer outcomes"
- "Adapt quickly to changing contexts"
2. Change Your Interview Process
Stop asking:
- "How would you build X?" (AI can answer this)
- "What's your technical background?" (becoming less predictive)
Start asking:
- "Tell me about a time you killed your own work because data showed it wasn't working"
- "Describe a situation where you identified a problem no one else saw"
- "How do you decide what not to build?"
- "Walk me through how you've changed your approach to [domain] over the past year"
Lenny Rachitsky's interview with Elena Verna highlights how top growth leaders evaluate talent: "I don't care about their resume. I care about how they think about problems."
3. Implement "Impact Reviews" Not "Performance Reviews"
Shift from:
- Quarterly reviews of output metrics
- Ratings based on volume of work
- Comparisons to peers on productivity
To:
- Continuous feedback on decision quality
- Retrospectives on outcomes vs. predictions
- Assessments of learning and adaptation
- Customer impact as the primary metric
Spotify's "Squad Health Check" model offers a useful template—it measures team health across multiple dimensions, not just velocity.
4. Create "Judgment Development" Programs
If judgment is the new scarce resource, invest in developing it:
- Run regular "decision autopsies" where teams review past choices
- Create mentorship programs pairing junior talent with those who have strong track records
- Build case study libraries of product and technical decisions
- Encourage writing and documentation of reasoning, not just conclusions
Amazon's "six-page memo" culture is valuable precisely because it forces clarity of thinking. AI can draft the memo. It cannot provide the judgment that goes into it.
The Cultural Shift Required
None of this works without cultural change. Organizations must shift from:
Rewarding "busy-ness" → Rewarding impact Valuing speed → Valuing direction Celebrating output → Celebrating outcomes Promoting based on seniority → Promoting based on judgment
This is uncomfortable. It requires:
- Letting go of comfortable metrics
- Accepting that some valuable work is invisible
- Trusting people to define their own high-impact work
- Being willing to say "you shipped a lot, but it didn't matter"
Stripe's Patrick Collison and Tyler Cowen have written about the need for "progress studies"—understanding what actually drives innovation. The same thinking applies internally: What actually drives customer value? What actually makes teams effective?
The answers are rarely the things that are easiest to measure.
What This Means for Career Development
For individual contributors and managers navigating this shift:
If you're early in your career:
- Use AI aggressively to accelerate learning, not to avoid it
- Focus on building judgment through rapid experimentation
- Seek projects with ambiguous requirements
- Learn to articulate why you're making decisions, not just what you're building
If you're mid-career:
- Shift from execution to decision-making
- Develop expertise in problem identification, not just problem-solving
- Build relationships with customers and stakeholders
- Become the person others come to for judgment calls
If you're a leader:
- Model the behavior you want to see
- Protect time for strategic thinking
- Build systems that reward the right things
- Be willing to promote based on judgment, not just output
The Paradox of AI and Human Value
Here's the paradox: AI makes human judgment more valuable by making everything else cheaper.
When execution is abundant, strategy becomes scarce. When code is commoditized, architecture becomes critical. When everyone can ship features, knowing which features to ship becomes the bottleneck.
This mirrors what happened in previous technology shifts. When manufacturing became automated, design became more valuable. When information became freely available, curation became more valuable. When distribution became democratized, taste became more valuable.
AI is doing to knowledge work what automation did to manufacturing: raising the floor and widening the gap between good and great.
The CPOs and CTOs who understand this will build better products with better teams. Those who don't will wonder why their increasingly "productive" teams aren't actually winning in the market.
Key Takeaways
-
Output metrics are losing signal fast. Lines of code, features shipped, and tickets closed are poor proxies for value when AI lifts everyone's productivity.
-
Judgment is the new scarce resource. The ability to decide what to build, identify important problems, and make sound tradeoffs cannot be automated.
-
Initiative and problem-finding separate good from great. AI solves problems you give it. Exceptional talent finds problems worth solving.
-
Adaptability matters more than expertise. The half-life of specific skills is shrinking. Learning velocity is appreciating.
-
Customer outcomes are the ultimate metric. Measure impact, not activity. The best builders obsess over whether their work actually matters to customers.
-
Your evaluation systems shape behavior. If you measure output, people will optimize for output. If you measure judgment and impact, they'll optimize for that instead.
-
This requires cultural change, not just process change. You can't bolt new metrics onto old cultures and expect transformation.
Further Reading & References
On AI and Productivity:
- Ethan Mollick: "Centaurs and Cyborgs on the Jagged Frontier" - Harvard Business Review study on AI and knowledge work
- "The AI Revolution in Software Development" - GitHub's research on Copilot's impact
On Judgment and Decision-Making:
- Shreyas Doshi: "The LNO Framework" - Distinguishing leverage, neutral, and overhead work
- Annie Duke: "Thinking in Bets" - Decision-making under uncertainty
- Daniel Kahneman: "Noise: A Flaw in Human Judgment" - Understanding variability in judgment
On Product Strategy:
- Des Traynor: "Intercom on Product Management"
- Ben Thompson: "Stratechery" - Particularly his work on aggregation theory and AI
- Marty Cagan: "Transformed" - Product operating models
On Talent and Organizations:
- Andy Grove: "High Output Management"
- Laszlo Bock: "Work Rules!" - Google's approach to talent
- Patty McCord: "Powerful" - Netflix's culture of freedom and responsibility
On Learning and Adaptation:
- Carol Dweck: "Mindset" - Growth mindset research
- Satya Nadella: "Hit Refresh" - Microsoft's cultural transformation
Topics for Deeper Exploration:
- How AI changes the economics of software development
- Building "judgment infrastructure" in organizations
- The role of taste and curation in AI-augmented work
- Measuring and developing strategic thinking
- Creating feedback loops that improve decision quality
- The future of technical leadership when code is commoditized
- How customer research methods must evolve with AI tools
- Building teams optimized for adaptation rather than execution
The transition from measuring output to measuring judgment is uncomfortable. It requires letting go of metrics that felt objective and embracing assessment that feels subjective. But the alternative—continuing to evaluate talent using metrics that AI has made meaningless—is far worse.
The organizations that make this shift early will attract and retain the talent that actually drives innovation. Those that don't will find themselves with teams that are "productive" on paper but losing in the market.
The choice is clear. The implementation is hard. But the best product and engineering leaders have never shied away from hard problems—especially when they're the right problems to solve.