#40 - Why AI Evaluations Have Never Been…

Apr 27

From traditional ML to GenAI: how evaluation became the frontline of AI product success.

1 Comment

Well said, GenAI testing does feel a lot like UX testing. Even more so for features allowing free form input from users.

Another thing that I have observed is how leaders interpret these human testing results. Just because the test scores were low in the first try doesn't mean that the feature doesn't have any potential. In fact, the better the testing setup, worse first cut results are to be expected. It becomes a lot easier to improve the features with good real test data.

Expand full comment

Reply

Share

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts