Do you really know what Bias is? (Part 2/3)

Exploring the Depths of Algorithmic Bias - Can we ever have a bias-free AI Product?

Apr 04, 2024

#beyondAI

Recap - What we discussed so far in the previous episode has provided us with enough traction to delve deeper into the topic of bias. We learned that there is a common understanding of bias, which correlates with the term AI Bias. But it still needs to be distinguished in a way that ensures not only our stakeholders understand what we are talking about but also ourselves.

We also covered why understanding bias is crucial for AI Product Managers. It helps in daily interactions with stakeholders.

Now - it’s time to explore the remaining parts:

What is Algorithmic Bias?
What is ML (or Statistical) Bias and its relation to AI Bias?
And the question of who is actually responsible for AI Bias?

Enjoy!

To create AI to forge a better future,
not one that merely reinforces the existing state of affairs.

What is Algorithmic Bias?

In the previous post, I mentioned that some argue Algorithmic Bias is primarily due to underlying data bias. However, ML algorithms themselves can act as amplifiers of these existing biases within the data. They can also introduce their own biases, even in the absence of biased data. This sheds new light on how we should mitigate bias while developing our AI models.

Algorithmic Bias also emerges from the way algorithms process data, learn from it, and make decisions, which can be independent of the data used. It concerns the 'decision-making logic' of AI systems themselves, which can inadvertently favor certain outcomes over others, even if the data is balanced.

💡 Time for an Example
Imagine we have an AI system designed to approve or deny loan applications. This system evaluates applications based on multiple factors, such as income, employment history, credit score, and requested loan amount. The AI system learns and decides to prioritize employment history significantly more than other factors. It saw a pattern in employment stability as the most critical indicator of a person's ability to repay the loan.
What about the data?
The data used to train the AI system is perfectly balanced across different demographics - equal representation of age groups, genders, ethnic backgrounds, and socio-economic statuses. There's no inherent bias in the data itself; every group has people with varying income levels, employment histories, credit scores, and loan amounts.
Despite the balanced data, the AI system exhibits a bias towards applicants with long, uninterrupted employment histories. This decision-making logic inadvertently disadvantages certain groups:
Younger applicants who might not have had the opportunity to build a lengthy employment history.
Individuals who have taken career breaks for various reasons, such as education, parental leave, or personal health issues.
People in industries with more frequent job changes or gaps in employment, which can be common in certain fields.

In this example, the bias doesn't stem from the data but from the decision-making logic of the AI system - specifically, the overemphasis on employment history. This illustrates how algorithmic bias can occur even when the input data is balanced.

A standout paper titled Moving beyond “algorithmic bias is a data problem”, authored by Sara Hooker, underscores the significance of algorithmic bias extending beyond mere data bias and articulates this point with clarity:

💡 Btw, I didn’t select her work solely because the title starts with "Moving Beyond…", which aligns so well with my #beyondAI philosophy.

Okay, maybe a bit.

Oh wait, does this count as a form of selection or confirmation bias? 😜

A surprisingly sticky belief is that a machine learning model merely reflects existing algorithmic bias in the dataset and does not itself contribute to harm. Why, despite clear evidence to the contrary, does the myth of the impartial model still hold allure for so many within our research community? Algorithms are not impartial, and some design choices are better than others. Recognizing how model design impacts harm opens up new mitigation techniques that are less burdensome than comprehensive data collection.
Sara Hooker - VP Research, Head of Cohere For AI and Ex-Google Brain Research Scientist


💡 From a product management perspective this is of outstanding importance: 

"Recognizing how model design impacts harm opens up new mitigation techniques that are less burdensome than comprehensive data collection"

If we can remain cost-efficient while reducing bias, then it's certainly worth pursuing. The most significant expenses in AI development are not the model itself but data collection and preparation.

I can only confirm: Among the many Data Scientists I've worked with over the last decade, there's a prevailing opinion that there's nothing we can do on the algorithmic side to avoid bias.

However, looking closer, we find that most standard ML algorithms are primarily designed to optimize for accuracy.

You tune your parameters for accuracy
You select the best loss function for accuracy,
and you define the number of iterations for accuracy.

Data Scientists often prioritize this because it is frequently their main objective, as dictated by their stakeholders.

For instance, this could involve increasing the accuracy of object recognition, churn prediction, sentiment analysis, or any other application you might think of.

Stakeholders might be unaware of these intricacies, but we, as AI Experts, should be well informed. We should recognize that our decisions to increase accuracy can sometimes impact fairness.

Algorithmic Bias is not ML (or statistical) Bias 🤪 and still…

Understanding this is so crucial that I can't help but delve into this topic. And, to add further complexity, I want to introduce you to a new type of "bias" – a type with which Data Scientists are well-acquainted.

😨 OMG… If I fail to make my point clear with this series, at least I'll rank high on Google for the keyword “BIAS”.

Let’s first recap:

AI Bias

AI Bias broadly refers to patterns where AI systems exhibit prejudiced outcomes due to flaws in their design, development, or deployment. These biases can manifest in ways that unfairly benefit or harm certain groups or individuals. This concept aligns with the understanding most people have when we simply discuss "Bias.”

Algorithmic Bias

Algorithmic Bias contributes to AI Bias, referring specifically to biases that emerge from the algorithms underpinning AI systems, as well as from the data used to train the AI model. These biases can arise from how an algorithm processes data, the assumptions its creators have made, or the interactions between the algorithm and the data it processes.

Data Bias

Data Bias indirectly (through Algorithmic Bias) contributes to AI Bias and occurs when the dataset used to train an AI system is not representative of the real-world scenario it aims to model. This lack of representation can lead to skewed outcomes, as the data may overrepresent certain groups or phenomena while underrepresenting others. AND Equally important is the scenario where the dataset accurately reflects real-world situations, including the human biases present in them.

ML (statistical) Bias

And now, In machine learning "bias" is a technical term that doesn't necessarily have a negative connotation; it's just part of how models are evaluated and tuned.

I’m also touching on ML Bias, as some of you might be familiar with the concept of the Bias-Variance Tradeoff. The goal of ML algorithms is to balance bias and variance to minimize a tradeoff. Essentially, the tradeoff is between achieving high accuracy and ensuring robust generalization. The ideal is to find a sweet spot where the model is sufficiently accurate on known data (demonstrating accuracy) while also performing well on data it hasn’t seen before (showing good generalization).

Is ML Bias contributing to AI Bias?

As you can see, the bias-variance tradeoff is specifically designed for accuracy and generalization, not fairness. Most algorithms are designed solely to balance these two metrics. However, when we add fairness as a metric, we must find a balance between accuracy, generalization, and fairness.

As if balancing two factors wasn't difficult enough, we now have a third to consider. I'm not certain about how far research has progressed in this field, but I can imagine that it's not an easy task.

All this is explained to make a point and to tie back to the section title:

Algorithmic bias is not the same as ML bias and should therefore not contribute to AI bias. But, ML Bias can be seen as a proxy. An interface between Data Bias and Algorithmic Bias.

🤪 Please forgive me. 
If you are confused, it gets a bit worse now.

Reducing ML bias to enhance accuracy based on a dataset already biased (from a fairness standpoint) essentially leads to a paradox scenario: we end up with a model that's unbiased to a dataset full of biases. 😂

On the flip side, reducing ML bias when working with an unbiased dataset means the algorithm hones in on what it deems most significant, like the employment history example we discussed. This focus, albeit well-intentioned, may inadvertently pave the way for new biases, subtly embedding unfairness where none existed before.

🤓 Did i say: I'm sorry?

Essentially, I want to convey that:

ML algorithms focused on balancing bias-variance metrics will adopt dataset bias. If no bias exists, they might introduce their own bias.

The only way to overcome this is to choose ML algorithms that allow for the addition of fairness as a third metric. However, I fear this might result in some loss of accuracy.

It seems we can’t win in this game!

But perhaps, we are viewing this through too narrow a lens.

Is increasing fairness at the cost of potentially lower accuracy really a loss?

Do you remember this?

A surprisingly sticky belief is that a machine learning model merely reflects existing algorithmic bias in the dataset and does not itself contribute to harm.

I just think that the sticky belief persists and holds steadfast because all these biases from different contexts, but still related somehow, don't allow one to give up on this belief.

Even scientists get confused.

I should start a petition to forbid the word bias 😂

So whenever you see a data scientist struggling hard to increase the model's accuracy, be curious and ask, "At what costs?" If the data scientist starts to show you the computational costs, direct them to fairness. This might at times be necessary.😉

Now, this was difficult.

You can be proud you did it so far.

I'm exhausted, and I can imagine that this second episode wasn't that easy to understand, so far. It wasn't for me, at least.

The world isn't easy 🌎, but it's worth giving it a chance!

And still, we aren’t finished yet.

Who is responsible for Bias in AI Products?

To avoid finger-pointing, it is best if companies first understand how bias is caused. I believe that once the underlying concept of bias is understood, everyone will immediately be able to answer this question.

To see what my LinkedIn professional community is thinking about it, I started a survey. Now, this is definitely not a representative or scientifically conducted survey, but it should give us at least a tendency about how professionals in the data and AI space currently are thinking about it.

Data architects and engineers, data scientists and analysts, AI and data product managers, AI managers and consultants, and other AI and data professionals from various companies, including Microsoft, Amazon, Google, Deloitte, Accenture, Ernst & Young, Deutsche Telekom, and Vodafone, voted.

Over 65% see the responsibility as part of every AI expert's job (but the office dog, of course).

This result leads me to interpret that the majority sees tackling bias as too huge an issue for just one person or role, and I have to agree.

Yet, nearly all AI products available are capturing and even reinforcing existing human biases. From this, I can conclude that there isn't a dedicated process yet implemented in many organizations to reduce bias in their AI products, or we will never be able to completely eliminate bias.

Using what we have learned so far regarding data and algorithmic bias, bias in AI products can occur at various steps in both the data pipeline for preparing the training dataset and the AI pipeline for producing the AI model.

Implementing fairness often requires a multidisciplinary approach, including legal, ethical, and societal perspectives, alongside technical model development. It's about extending beyond the numbers and into values, ensuring that AI tools augment human decision-making without perpetuating or exacerbating existing inequalities.

But, even if a completely bias-free AI product seems unattainable, we must tirelessly work towards minimizing biases, ensuring our technologies are equitable and inclusive.

To understand exactly where different types of biases arise when building AI products, and why so many believe that only the office dog 🐶 is free from sin, I am sorry to inform you that I'll need one more episode 😆.

But we will also learn how Jeffrey Steinkeller, CEO Steiny AI, Inc., increases awareness about bias, I will share my views on mitigating it and discuss what this conversation on biases means for AI product managers.

I hope to see you then.

JBK 🕊

#AIProductManagement