I've worked as a data analyst at companies like Amazon for 20 years. Using ChatGPT for data analytics is a risky move — AI can't do the work we do.
- Brandon Southern is the former head of analytics at eBay, Amazon, and GameStop.
- He says that using ChatGPT for data analytics is a risky move.
I frequently see posts from people suggesting ChatGPT can easily be used for data analysis.I also hear daily comments suggesting that ChatGPT will replace data analyst jobs.
As a former head of analytics at Amazon and a 20-year tech and analytics veteran, I'm not all that concerned about ChatGPT replacing data analyst jobs.
But what I am concerned about is leaders believing that it can or should replace data analysts' jobs.
This would be a disastrous decision given the current state of analytics environments and how ChatGPT works. Using ChatGPT as a replacement to data analysts will accelerate poor decision-making and propagate bad data across corporate networks at a rate that we've never seen before.
To ensure decision quality, leaders must avoid putting the cart before the horse when it comes to AI tools and models such as ChatGPT.
They must first focus on fully understanding the challenges within their analytics environments and resolve those challenges before building automated tools on top of these environments. Failure to do so will almost certainly result in poor decisions, corporate and financial risk, and erosion of trust by employees and customers.
ChatGPT is basically a predictor
ChatGPT is a large language model, which sounds fancy, and the responses that it provides has users feeling like it's an intelligent being. But it's far from that. At a basic level, the model is taking a set of information as an input and then it predicts an outcome based on those inputs.
To provide an accurate prediction, the model must learn from several different scenarios and been given accurate information. The model will respond with an answer that will be correct or incorrect depending on how it was trained. If you train the model to believe that 2 +2 = 10, it will give you that answer, even though it is wrong.
This is why using such models are a major problem for analytics teams.
The problem is that we have multiple sources of truth
I've worked at Amazon, eBay, GameStop, VMWare, and a handful of start-up companies over the last 20 years. I've also provided consulting and advisory services to many others in the analytics field. What I've learned is that regardless of the company size, age, or industry, they all face the same problems, and one of these problems has to do with multiple sources of truth.
For example, a report that the finance team uses says that the company has 10,000 new customers. But a report that the marketing team uses says that the company has 12,000 new customers. Who is correct? Are either of those figures, correct? Is there different context for each of those figures where both reports could be correct?
We don't know, at least not without a significant amount of investigation.
And that's the current state of analytics teams, without trying to apply machine learning models on top of the current environment.
ChatGPT can't do the work of a data analyst
Data analysts today don't have accurate and consistent data, along with the proper context for that data in their existing reports, dashboards, and databases.
If data analysts can't agree and multiple sources of truth are used to train ChatGPT or other large-language models, we can't expect the model to produce accurate results.
Unfortunately, stakeholders are already using this inconsistent information through self-service dashboards and reports, many times without the help of data analysts. The saving grace in today's environment is the protection offered by data analysts. In the current environment, data analysts provide additional context to data output, and they navigate these data issues in real-time as they construct their data analysis.
As self-service capabilities increase, data analysts are becoming less involved with assisting stakeholders with data interpretation.
This has created a slippery and dangerous slope for the organization as more stakeholders take facts from various dashboards without context provided by an analyst or without additional quality assurance checks. It's a situation that will be accelerated by the ease of use of tools like chatGPT and the blind trust of the results, without understanding the challenges that data-driven organizations face today.
Leaders should understand the risk they're taking
If leaders subscribe to the notion of ChatGPT being a viable internal analytics solution for their organization, without first resolving the challenges within their analytics organizations, they will be putting the company at risk.
Here's how the situation would likely progress:
- Individual or company-wide teams would believe that chatGPT is a valid solution and implement ChatGPT.
- They would train the model with the internal data today, even though proper context is lacking, and the current reports have inconsistent results. This would likely happen without team members even realizing that these are issues today.
- Easy access would be granted to the ChatGPT, much like team members already have with Tableau and Power BI dashboards.
- The stakeholder would ask ChatGPT questions as they do today, but now without the clarifying follow-up questions being asked by data analysts. These questions are vital to receiving the proper answer but are frequently taken for granted and overlooked in the initial requests by stakeholders.
- Stakeholders would likely take the most efficient route — a non-verbose prompt — without realizing the potential for inaccurate results.
The public has provided sufficient evidence that they don't want to read or write long emails. Instead, they prefer to use quick chat messages, even going as far to use emoji's instead of providing explicit words.
This is also true in verbal conversations that I've had directly with stakeholders and with my team members speaking with stakeholders over the last 20 years.
Sufficient and detailed information and requirements are almost always lacking from stakeholder requests.
Because of these issues, not only will the models receive a bad education and be at risk of producing inaccurate results, even with the most accurate and detailed prompts, but there's a high probability that the prompts will not be sufficient.
There's a real risk of chatGPT being used and trusted in organizations.
Brandon Southern is the former head of analytics at eBay, Amazon, and GameStop. He also creates TikToks about data analytics and career development.