AI Evals & Discovery - All Things Product Podcast with Teresa Torres & Petra Wille

Listen to this episode on: Spotify | Apple Podcasts

Building AI products isn’t just about clever prompts and orchestration—it’s about knowing if what you’ve built actually works. In this episode, Teresa Torres and Petra Wille dive deep into AI evals: how they’re defined, why they’re essential, and how teams can implement them to ensure product quality.

Teresa shares her journey building her Interview Coach tool and the hard lessons she learned about evals along the way. From golden datasets and synthetic data to error analysis, code-based checks, and LLM-as-judge methods, you’ll walk away with a clearer picture of how to measure and improve AI products over time.

What you’ll learn in this episode:

What “evals” actually mean in the AI/ML world
Why evals are more than just quality assurance
The difference between golden datasets, synthetic data, and real-world traces
How to identify error modes and turn them into evals
When to use code-based evals vs. LLM-as-judge evals
How discovery practices inform every step of AI product evaluation
Why evals require continuous maintenance (and what “criteria drift” means for your product)
The relationship between evals, guardrails, and ongoing human oversight

Resources & Links:

Follow Teresa Torres: https://ProductTalk.org
Follow Petra Wille: https://Petra-Wille.com

Mentioned in the episode:

How I Designed & Implemented Evals for Product Talk’s Interview Coach by Teresa Torres
Teresa’s - Interview Coach
ML (Machine learning)
Story-Based Customer Interviews - On Demand course by Teresa
LLM (Large language model)
AI Evals for Engineers and PMs course (get 35% off through Teresa’s link) on Maven
V0
JSON (JavaScript Object Notation)
Anthropic
The Product Leadership Wheel - A Framework for Defining and Growing Product Leadership at Scale by Petra Wille
Lovable
Behind the Scenes: Building the Product Talk Interview Coach by Teresa
Previous episode: - Building AI Products

Coming soon from Teresa:

Weekly Monday posts sharing lessons learned while building AI products
A new podcast interviewing cross-functional teams about real-world AI product development stories

Petra Wille23 September 2025

Want Me as Your Coach—in Your Inbox?

AI Evals & Discovery - All Things Product Podcast with Teresa Torres & Petra Wille