#48 - The Best AI Products Expect Errors

Designing AI Products Ready for Mistakes

Jun 29, 2025

#beyondAI

Some years ago, I would never have expected that almost everyone would understand what I mean when I talk about the cost of wrong AI.

These days, nearly everyone has paid that cost, at least with their nerves, while interacting with an AI. We’ve all faced frustrating moments with the most common AI tools: chatbots that give nonsense answers, recommendation engines suggesting irrelevant content, or voice assistants misunderstanding simple commands.

But there’s a cost that goes far beyond personal annoyance, frustration, or headaches. This deeper cost is felt by businesses that integrate AI into their processes and workflows.

With AI products, there’s a critical dimension that sits above all the usual measures of product success: the quality of the AI’s output.

An AI product might address a user’s pain point beautifully. It might have a sleek, intuitive interface and a high-performing AI model. But ultimately, the product’s value stands or falls on how reliable, accurate, and appropriate the AI’s output is. And no matter how well the model performs, it is never perfect.

Whether you’re working with predictive models or generative systems, the AI model is the beating heart of your solution. Its outputs define the product’s quality in ways that are far more volatile and impactful than most traditional software features.

This is why AI product success hinges first and foremost on the quality of the outputs your model generates, assuming of course that the problem you’re solving is genuinely valuable to a specific user group.

Long before UI polish, feature richness, or clever pricing strategies, the key question remains:

Can users trust what the AI produces?

Because when AI goes wrong, the cost to the business can quickly exceed all the times the AI was right and beneficial.

That’s why every AI Product Manager needs to learn how to measure, monitor, and improve AI output quality.

This is what today’s article is about: the cost of wrong AI.

All I Know: Not All AI Is Measured Alike — and That’s Enough for Now

Not all AI is created equal, and neither are its outputs.

A predictive model might forecast a sales figure, classify a customer’s sentiment, or flag a suspicious transaction. Its output is usually structured, numeric, or categorical—something you can measure directly against the truth. Metrics like accuracy, precision, recall, F1 scores, or ROC curves are well-established and relatively straightforward to track.

Generative AI, however, operates in a completely different arena. Its outputs are creative, open-ended, and often subjective. A large language model might draft marketing copy, summarize a report, or generate code. An image model might produce new artwork or product mockups. In these cases, the “correctness” of the output isn’t always a simple yes-or-no answer. Instead, it sits on a spectrum. The quality of these outputs can depend on style, tone, factual accuracy, coherence, relevance, and even subtle nuances like empathy or humor.

Because of these differences, the way we assess output performance varies dramatically across AI types:

For predictive AI, we measure how close the output is to a known ground truth.
For generative AI, we often have to define what “good” looks like for our specific use case, and then find ways to evaluate it—whether through human assessment, automated checks, or user feedback signals.

As AI Product Managers, we need to be fluent in these differences. And by fluency, I mean first understanding what type of AI the solution requires, and second, knowing how its performance can be measured.

Let me also clarify one important point. Even though I’ve been in the AI product management field for over ten years, I haven’t worked with every AI type out there. I’m not fluent in measuring the performance of all possible AI solutions. My experience is mainly in building classical prediction models and those generating insights, such as churn prediction and sales forecasting. And natural language processing—including large language models, which are part of NLP.

But the fact that I’m aware of the nuances and differences among AI types, and that each requires its own methodology for evaluating performance, helps me make informed decisions about where to focus my next chapters of learning. It also prepares me for tackling new AI product challenges that might require different solution types in the future.

That’s the purpose of this article: to give you greater awareness of these distinctions, so you can navigate the cost and complexity of managing AI products with more confidence.

What Is the Cost of Wrong AI?

When people talk about “AI errors,” they often imagine a chatbot saying something silly or a model predicting a slightly off number. But in the context of internal AI products, the cost of wrong AI is much deeper and far-reaching.

As you might already know, I mainly write about AI product management in the context of building AI products within and for enterprises. In this world, I have built my own latticework of mental models. One of those mental models tells me that regardless of where I want to implement AI in a company, it always touches on one of two core types of processes: those that generate revenue or those that protect revenue.

I’m yet to experience that there could be a truly separate third category.

So, in this very simplified world of enterprise processes, it quickly becomes apparent that even a single change within these processes either reduces or increases an outcome. And it’s here that the real cost of wrong AI can often be quantified.

Let’s break it down.

1. The Cost Of Wrong AI in Revenue-Generating Processes

Imagine an AI model used in a sales forecasting process. Its job is to predict how much revenue each product line will generate next quarter. If that model consistently overestimates demand:

The business might overproduce inventory, tying up capital unnecessarily.
Sales teams might push the wrong products, missing actual market demand.
Marketing budgets could be allocated to lower-impact campaigns.

And the result? Missed revenue targets, higher operational costs, and reduced trust in the analytics or product teams who deployed the model. At least, if anyone ever discovers that it’s your solution causing the problem. But that’s a different topic altogether. :)

2. The Cost Of Wrong AI in Revenue-Protecting Processes

Consider fraud detection—a classic example of a revenue protection process. An AI model might analyze transactions to flag suspicious behavior. If the model generates too many false positives:

Legitimate customer transactions get blocked.
Call centers become overwhelmed with complaints.
Customers lose trust and might take their business elsewhere.

I think the point should be clear now.

A Final Example: LLM-Based Tender Assistant

Let’s take one more example—this time from the world of generative AI. Imagine you’re building an internal LLM-based Tender Assistant.

The goal is to help a tender management team quickly analyze and summarize large, complex tender documents from potential partners or clients. On paper, this sounds like the perfect productivity boost. But here’s where things can go wrong:

The LLM might hallucinate facts, inserting details about tender requirements that don’t exist in the original documents.
Important legal or financial clauses might be omitted or misinterpreted in the summary.
The assistant might phrase recommendations too confidently, making users trust outputs without verifying them.

In a tender process, mistakes like these can be costly:

Teams could base their bid strategies on incorrect information.
The company might miss critical compliance requirements.
Misunderstandings could damage relationships with potential clients or partners.

Even if the AI only makes small errors, the cost of cleaning up the mess—through manual document reviews, legal checks, and rework—can wipe out any productivity gains the solution promised. Worse still, if decision-makers lose trust in the assistant, adoption drops, and the entire investment risks becoming shelfware.

This is exactly why the cost of wrong AI goes far beyond just technical performance. In internal enterprise products, it’s about operational disruption, financial risks, and the delicate trust between business teams and the technology they rely on.

The Hidden Costs Behind These Examples

Across all these examples, there’s a common theme:

Errors don’t just produce slightly “off” numbers—they ripple through processes, triggering downstream costs that can far exceed any initial savings promised by AI.
Fixing mistakes often means manual rework, retraining models, and eroding stakeholder trust, which can slow future AI adoption.

This is why, in internal enterprise environments, the cost of wrong AI is rarely just technical. It’s operational, financial, and political.

Understanding where your AI product sits in this landscape—and what processes it touches—is the first step in quantifying the true cost of errors.

You Cannot Avoid Wrong AI, But You Can Mitigate the Risk

You might already have understood: There is no such thing as a perfect AI. It simply isn’t possible.

We use machine learning algorithms for problems where ordinary algorithms fail to deliver a proper answer within a reasonable amount of time. These problems are often so complex that you can’t simply dictate rules for how to handle every single case. There are simply too many variations, exceptions, and edge cases.

Machine learning algorithms, instead of relying on predefined rules written by humans, try to make sense of data and discover as many patterns and rules as possible on their own. But this also comes at a cost.

The cost is that we will inevitably get answers with some degree of error. And this degree of error is something we, as AI product teams, need to keep in mind at every moment.

The most successful AI products are those that incorporate strategies to cope with these errors.

How to Build AI Products Ready for Mistakes

So, how do you build AI products that stay successful despite inevitable errors?

1. Know Where Errors Matter Most

Not every mistake is equally significant. Some errors are merely annoying, while others can trigger real financial, legal, or reputational damage. As an AI Product Manager, your first job is to figure out where errors in your AI system would cause the biggest harm so you can prioritize mitigation efforts where it matters most.

✅ Predictive AI:

Critical when predictions directly drive business actions, like fraud detection, credit scoring, or forecasting.
Errors here can have measurable financial or regulatory consequences.

✅ Generative AI:

Equally important but different in nature. Mistakes often mean hallucinations, factual inaccuracies, or off-brand content.
E.g. a chatbot offering incorrect legal advice, or an image model generating inappropriate visuals.

2. Keep Humans in the Loop

AI alone isn’t enough, especially in high-risk situations. Successful AI products are designed so humans can step in to review, correct, or override AI outputs where necessary. This not only prevents costly mistakes but also builds trust with users who know they’re not entirely at the mercy of the machine.

✅ Predictive AI:

Less common in high-volume, low-risk predictions but critical in high-stakes use cases.
E.g. financial approvals, medical diagnoses, security alerts.

✅ Generative AI:

Essential because generative outputs can be unpredictable and subjective.
E.g. humans reviewing marketing copy, legal summaries, or code before release.

3. Monitor Performance Continuously

AI isn’t static. Models degrade over time as real-world data shifts or new business challenges emerge. Successful AI products have monitoring systems in place to catch drops in performance early, so issues can be fixed before they cause significant harm.

✅ Predictive AI:

Standard practice. Retrain models regularly as underlying data changes.
E.g. changes in customer behavior affecting churn models.

✅ Generative AI:

Also critical, but more complex.
- Track hallucination rates.
- Monitor factual accuracy.
- Watch for toxic or biased outputs.
Tools like automated evals and red-teaming are increasingly used to help.

4. Educate Your Users

A critical part of any AI product’s success is teaching users what the system can and can’t do, how to interpret its outputs, and when to be cautious.

✅ Predictive AI:

Users need to understand that predictions are probabilities, not certainties.
Helps avoid poor decisions based on overconfidence in model outputs.

✅ Generative AI:

Absolutely crucial. Generative outputs can appear impressively fluent yet be entirely wrong.
Users should treat outputs as drafts rather than final truth, and know when to verify information.

5. Design Escape Routes

When AI goes wrong, users need a way out. Successful AI products include features that let users easily reverse decisions, escalate problems, or switch back to manual processes. Designing for graceful failure prevents frustration and loss of trust.

✅ Predictive AI:

Important for high-stakes decisions. Allow manual overrides or alternative workflows.
E.g. letting a human analyst confirm a flagged fraud alert.

✅ Generative AI:

Absolutely essential. Users must be able to reject, edit, or regenerate content.
E.g. a “Regenerate” button for a chatbot answer, or clear disclaimers on sensitive outputs.

6. Quantify Risk and Communicate Transparently

Finally, successful AI product management means being honest about risk. Don’t hide limitations or pretend your AI is perfect. Instead, quantify how often errors occur, what kinds of harm they might cause, and how you’re reducing those risks. Transparency builds trust and helps stakeholders make informed decisions about using AI.

✅ Predictive AI:

Often well-established practice. Stakeholders expect error rates and performance metrics.
E.g. ROC curves, precision-recall trade-offs.

✅ Generative AI:

Needs extra emphasis because errors are less predictable and often subjective.
Stakeholders must understand risks like hallucinations, bias, and tone issues, and the cost of mitigating them.

Applying the Strategies: The Tender Assistant Example

Let’s make this real. Let‘s take that LLM-based Tender Assistant for an enterprise example from above. Its job is to analyze large, complex tender documents and produce useful outputs such as:

Summaries of lengthy legal or technical requirements
Lists of critical compliance obligations
Suggested draft responses for tender submissions
Risk highlights based on tender clauses

On paper, it sounds like a dream tool for efficiency. But here’s where wrong AI can become costly — and how each of our strategies helps manage the risk.

1. Know Where Errors Matter Most

The first step is to pinpoint exactly where mistakes from the Tender Assistant would hurt the business most.