This article is the first in a series that explores the different roles needed to build successful internal AI products. We will look at what each role contributes, what they do not, and when they truly belong on the core team. In AI product development, assembling the wrong team is not just inefficient. It is expensive, misleading, and one of the most common reasons AI initiatives fall short. And it happens more often than we like to admit.
Especially in corporate settings, where roles are staffed because someone is available, or because someone said, “AI needs a Data Scientist.” But that is the wrong lens. AI does not necessarily need a Data Scientist. A specific type of AI product does. And unless AI Product Managers can tell the difference, we will either over-engineer simple use cases or under-staff the hard ones. That is why it is essential to understand what each role really brings, and when they are critical to delivery, to trust, or to both.
Let’s start with one of the most misunderstood ones: the Data Scientist.
The general role of a Data Scientist: What they’re trained to do
A Data Scientist’s job is not just to “do AI“. It is to make sense of data — often messy, often incomplete, often overwhelming — and turn it into something you can act on. That might mean discovering new insights from customer behavior. It might mean building a model that predicts what users will do next. Or it might mean designing experiments to understand why a system behaves the way it does.
Their toolkit includes:
Statistical modeling and hypothesis testing
Exploratory data analysis (EDA)
Feature engineering and variable selection
Machine learning algorithm development and tuning
Model evaluation and validation
Interpretability and fairness techniques
They combine math, code, and critical thinking to move from data to decision logic. But that does not mean every team building with AI needs them.
What a Data Scientist actually contributes to internal AI products
Inside a company, building AI products is not just a technical challenge. It is a business challenge shaped by organizational complexity. Data is spread across silos. Expectations are vague. Timelines are tight. And trust is everything. In this context, the Data Scientist brings a set of capabilities no other role quite covers.
1. They bridge messy data and model-ready inputs
Internal data is rarely clean. It comes from legacy systems, manual processes, and diverse domains. A Data Scientist does not just take what’s given. They question it. They ask:
What does this column really mean?
Is this bias or signal?
Are we modeling the right problem or just what is easiest to compute?
Their ability to structure, clean, and translate that data is foundational. Because if you do not start from solid ground, no AI model, no matter how sophisticated, will deliver the impact you are looking for.
2. They build models tailored to internal realities
Most off-the-shelf models are trained on public datasets. But internal problems often require internal logic. Whether it is scoring leads based on sales history, predicting churn for B2B accounts, or classifying support tickets with enterprise-level nuance, these are not problems you solve with generic APIs. Data Scientists build models that reflect:
Internal processes
Business rules
Customer behaviors specific to your organization
3. They protect your product’s credibility
An AI solution can fail even if it works — if no one trusts it. That is where a Data Scientist makes a quiet but critical difference. They:
Test for fairness
Quantify uncertainty
Simulate how the model behaves under different conditions
Help stakeholders interpret the model’s decisions
In internal environments, where decisions affect teams, budgets, and customers, trust is essential. Without explainability, adoption stalls.
4. They turn AI discovery into AI strategy
Even before you have a clear product, you often have data. And before you know what to build, you need to understand what is happening beneath the surface. A Data Scientist helps you explore patterns, test early hypotheses, and generate strategic questions like:
Which customers are most affected by late delivery?
What behaviors lead to high support ticket volume?
Is there a predictable signal before a churn event?
They do not just help you build the product. They help you define what should be built.
When you need a Data Scientist on your core team
You do not need a Data Scientist for every AI initiative. But when you do, no other role can replace them. Here is when their presence goes from nice to have to essential.
You’re developing your own model
If your AI product involves training a custom model — for churn prediction, time-series forecasting, NLP classification, or anything similar — you need a Data Scientist. Even if you are fine-tuning an existing one, you will need their help with:
Feature engineering
Hyperparameter tuning
Model validation and selection
Performance benchmarking
Without this expertise, you are making educated guesses. And guessing with business-critical data is risky.
Your data is complex or proprietary
Internal data is rarely plug-and-play. It is fragmented, shaped by human behavior, and full of edge cases. A Data Scientist can:
Identify biases or data gaps
Select the right features and formats
Handle imbalance or missing values
Engineer variables that reflect actual business logic
This becomes essential when internal systems, roles, and processes do not follow clean structures.
You’re building products for insight, not just automation
Many internal AI products are not built to make decisions. They are built to uncover insights. These are discovery or sense-making use cases, and they need Data Scientists to make the results meaningful.
Examples include:
Customer segmentation models
Behavioral clustering (e.g., app usage or sales rep activity)
Root cause analysis of KPI trends
Pattern or trend detection in network data
Journey mapping based on events or actions
Without a Data Scientist, these products remain shallow. With one, they generate value across teams.
Your product must be explainable and auditable
In regulated industries, or anywhere that decisions must be reviewed or audited, transparency is non-negotiable. A Data Scientist ensures:
Traceable logic and clear decision paths
Documentation of how predictions are made
Use of techniques like SHAP or LIME
Monitoring for fairness, accuracy, and drift
This is not just for compliance. It builds confidence.
You’re in a discovery-heavy phase
Some products begin not with specifications, but with questions. In this case, a Data Scientist is the person who helps test feasibility and shape hypotheses. They can:
Analyze opportunity areas
Simulate outcomes
Estimate model performance
Clarify whether the use case is worth building
They help turn ambiguity into informed direction.
When you don’t need a Data Scientist on your core team
Not every machine learning–powered product requires a Data Scientist from the start. In some cases, their core skill set — model development, data exploration, statistical reasoning — may not be critical to delivering value early on. Here’s when that’s likely to be the case.
You are using foundation models via API, without custom training
If your AI product relies on foundation models (like GPT, Claude, or Gemini) for capabilities such as summarization, semantic search, classification, or generation, and you are not fine-tuning or training on your own data, then a Data Scientist is not immediately necessary. What you are doing is orchestration and application, not model innovation.
You’ll likely need:
A Prompt Engineer (if this is even a role) to structure interactions
An AI Engineer to handle retrieval or context enrichment
A Designer to build usable workflows on top of the model
Until you reach a point where you need to analyze performance or fine-tune with internal data, a Data Scientist would have little to contribute.
You are using pre-trained models for narrow ML tasks
Some internal products embed pre-trained models that perform a very specific ML task — like image classification, sentiment analysis, or language detection. If these models perform well enough and do not require retraining, you can build valuable products around them without needing a Data Scientist.
Examples:
Using an email sentiment classifier to route internal tickets
Applying an OCR model to extract structured data from documents
Leveraging a pre-trained keyword extractor to tag customer interactions
The model is already built. Your work is around productization, not modeling. Similar to the foundation model case above.
You are still validating the problem and don’t need model development yet
If your AI product is still in the early discovery phase, and your main question is whether the problem is real, valuable, and solvable with ML, then you may not need a Data Scientist immediately, if you can derive the answer without heavy data analysis. What you then need first is:
A clear problem framing
An understanding of available data sources
A simple prototype to test workflows or model fit
You might work with an AI Engineer or generalist to explore feasibility using basic, Non-ml models or prebuilt tools. A Data Scientist can come in once the product vision matures.
You need model orchestration, not model creation
Many internal AI products rely on combining multiple ML components — retrieval, embedding search, pre-trained classification — but do not involve building or training new models. The complexity lies in gluing pieces together, not in discovering patterns.
Common examples:
Retrieval-augmented generation (RAG) for internal knowledge assistants
Multi-step workflows using off-the-shelf models
Semantic search powered by vector embeddings
These products still use ML, but they are integration-heavy, not data science–driven.
Your evaluation focus is on business metrics, not model performance
If your current goal is to test whether the product drives business value or user adoption, rather than tuning model performance, a Data Scientist is not the most urgent role. You may be testing:
Whether users trust the AI assistant
Whether recommendations improve outcomes
Whether response time or accuracy meets baseline needs
Until model quality becomes the limiting factor, other roles will move the product forward more effectively.
What I’ve Learned From Building Internal AI Products
We often debate roles. But the more honest question is: What does the product need to succeed?
The title “Data Scientist” may sound generic. But their work is anything but. They are not just model builders. They are pattern finders, uncertainty reducers, and sometimes the only person in the room who understands whether your assumptions are statistically sound. When the product needs that, they are the right person. When it doesn’t, they’re not.
The goal isn’t to staff roles based on trends.
The goal is to solve the right problem with the right capabilities.
Most AI Products Today Don’t Need a Data Scientist—And That’s Not an Insult
We are in a wave of AI product launches. Internally. Externally. Everywhere. But most of them share a pattern. They are:
Wrappers around foundation models
RAG-based assistants powered by vector search
Applications using off-the-shelf models for classification or summarization
They are products built on top of pre-trained intelligence, not new intelligence developed from scratch. Which means they succeed or fail based on:
Prompt design
Workflow orchestration
User experience
Data integration and context
In this context, a Data Scientist is often not the limiting factor. The real blockers are adoption, alignment, or usability. So no—most AI products today don’t need a Data Scientist. They need strong engineers, great designers, and clear product thinking.
But that’s not the end of the story.
The Hidden Strength of the Misunderstood Role
For years, Data Scientists were treated as the “unicorns” of AI. Everyone wanted one. Few knew what they were actually supposed to do. Expectations were sky-high: build the model, explain it, deploy it, make it scalable, make it usable, make it compliant—and do it alone.
As a result, many Data Scientists developed much broader skills than their job title suggests.
You’ll often find Data Scientists who:
Write production-grade code
Design evaluation pipelines
Build dashboards and reporting layers
Tune prompts and experiment with LLM-based architectures
Manage experiments or run early-stage product discovery
So before we decide whether a Data Scientist belongs on the team, we should ask a more nuanced question: Can this person cover part of what we need, even if their title says otherwise?
Not every company has all the roles they need on paper. But sometimes, they already have someone who can fill the gap—quietly, competently, and creatively.
We rarely get to staff ideal teams.
But with the right awareness and flexibility, we can still build the right products.
JBK 🕊️
If you found this useful or want to share your approach to building internal AI teams, let’s talk. I’d love to hear what you’ve tried, what worked, and what didn’t.
Totally agree—just worth calling out:
Problem space first.
Then data.
Then science.
Data’s not the moat, it’s the mud. Too many teams loiter in warehouses praying for a eureka moment.
Sharp AI starts with a sharper question.
Everything else is noise.