Hamel Husain Shreya Shankar
Chapters
” What are a couple of the most common misconceptions people have with evals?
Just what the heck are evals?
Is this their actual system prompt, by the way, for this company?
” Now, for a product that is helping you with lead management, is that good?
Yeah, so what do you do with something like that?
Do you think we’ll get to a place where an agent can do this, where it has that context?
Do you have any specials?
Is that right?
(00:35:18): Now, do I like all the categories?
This is something you two developed based on your experience doing data analysis and data science…
So this also drives home the point that your open codes have to be detailed, right?
What comes next?
Maybe Shreya, just help us understand what code-based eval even is?
And the goal here is just to have a suite of tests that run before you ship to production that te…
Then I just go through and say, “Okay, when should you be doing a handoff?
Does that make sense?
What’s this about?
I know, obviously, depends complexity to the product, but what’s a number in your experience?
What comes next after you’ve built your LLM judge?
There’s two things to that, right?
They’re like, “Yeah, we dogfooded,” but are they, really?
Just to add to the previous question a little bit, why is there this debate, A-B testing versus e…
Why are we the only two people doing this the whole world?
” You can see their eyes pop open and be like, “What do you mean?
How long do you spend on this?
We go through a lifecycle of error analysis, then automated evaluators, then how to improve your …
Then, you’ll charge for that later down the road?
The people that did that research?
What’s something about Shreya that you like most?
Key Concepts