AI Evals & Discovery - All Things Product Podcast with Teresa Torres & Petra Wille
Listen to this episode on: Spotify | Apple Podcasts
Building AI products isn’t just about clever prompts and orchestration—it’s about knowing if what you’ve built actually works. In this episode, Teresa Torres and Petra Wille dive deep into AI evals: how they’re defined, why they’re essential, and how teams can implement them to ensure product quality.
Teresa shares her journey building her Interview Coach tool and the hard lessons she learned about evals along the way. From golden datasets and synthetic data to error analysis, code-based checks, and LLM-as-judge methods, you’ll walk away with a clearer picture of how to measure and improve AI products over time.
What you’ll learn in this episode:
What “evals” actually mean in the AI/ML world
Why evals are more than just quality assurance
The difference between golden datasets, synthetic data, and real-world traces
How to identify error modes and turn them into evals
When to use code-based evals vs. LLM-as-judge evals
How discovery practices inform every step of AI product evaluation
Why evals require continuous maintenance (and what “criteria drift” means for your product)
The relationship between evals, guardrails, and ongoing human oversight
Resources & Links:
Follow Teresa Torres: https://ProductTalk.org
Follow Petra Wille: https://Petra-Wille.com
Mentioned in the episode:
How I Designed & Implemented Evals for Product Talk’s Interview Coach by Teresa Torres
Teresa’s - Interview CoachStory-Based Customer Interviews - On Demand course by Teresa
AI Evals for Engineers and PMs course (get 35% off through Teresa’s link) on Maven
The Product Leadership Wheel - A Framework for Defining and Growing Product Leadership at Scale by Petra Wille
Behind the Scenes: Building the Product Talk Interview Coach by Teresa
Previous episode: - Building AI Products
Coming soon from Teresa:
Weekly Monday posts sharing lessons learned while building AI products
A new podcast interviewing cross-functional teams about real-world AI product development stories