Uh-oh — it looks like ChatGPT's AI model got lazy again
- OpenAI's top model has a problem: It seems to keep getting lazy.
- Users of GPT-4 have taken to OpenAI's developer forum to complain.
OpenAI's GPT-4 seems to have gotten lazy — again.
This time, though, frustrated users of the model powering ChatGPT's paid-for service aren't all hanging around for a quick fix.
They're looking to other models instead, with one, in particular, grabbing their attention: Anthropic's Claude.
OpenAI's top model still seems lazy
In recent days, some users of GPT-4, which was first released in March 2023, have been taking to OpenAI's developer forum and social media to vent about the model seeming far less capable than it once was.
Some complain about it not following "explicit instructions" by providing truncated code when asked for complete code. Others cite issues with getting the model to respond to their queries altogether.
"The reality is that it has become unusable," one user wrote on OpenAI's online forum last week.
It's not the first time the model's performance has lagged. GPT-4 is supposed to be OpenAI's finest product — one people are paying $20 a month to use.
As my colleague Alistair Barr first reported, signs that GPT-4 was getting lazier emerged in the summer of last year: The model seemed to be exhibiting "weakened logic" and returning wrong responses to users.
More evidence of laziness emerged again earlier this year, with OpenAI CEO Sam Altman even acknowledging that GPT-4 had been slacking. He posted on X in February that a fix had been issued to address complaints.
Back when signs of weakness first emerged though, no other company had released a model that — on paper at least — had comparable performance to OpenAI's GPT-4. This kept users attached to the company that arguably triggered last year's generative-AI craze.
That's not the case now.
GPT-4 alternatives emerge
In the face of a fresh batch of issues with GPT-4, users are able to experiment with a bunch of other models that have since emerged. Some of these not only seem to match OpenAI's top offering, but they might be outperforming it, too.
Take Anthropic's Claude. The OpenAI rival, which is backed by the likes of Google and Amazon, released a premium version of its Claude model earlier this month called Claude 3 Opus. Think of it as an equivalent to GPT-4.
Upon its release, Anthropic shared data that compared Claude 3 Opus' performance to its peers across several benchmarks like "undergraduate level knowledge," "math problem-solving," "code," and "mixed evaluations." Across almost all of them, Claude came out on top.
It's not just Anthropic's data that says its model is better. This week, Claude 3 Opus overtook GPT-4 on LMSYS Chatbot Arena's leaderboard, an open platform for evaluating AI models.
Of course, there's a difference between something looking good on paper versus being able to deliver in practice. But in the wake of GPT-4's problems, even OpenAI loyalists have had plenty of incentive to try alternatives like Claude.
It's clear that many are more than impressed.
After a coding session with Claude 3 Opus last week, a software engineer posted on X that he thought it crushed GPT-4. "I don't think standard benchmarks do this model justice," he wrote.
Allie K. Miller, an AI angel investor, said that GPT-4 feels like it's worse now than it was a few months ago. "Most folks I know are using Claude 3," she wrote on X, as well as Mistral AI's Mixtral 8x7B model.
Ethan Mollick, a Wharton professor, even found Claude 3 to be better versed in J.R.R. Tolkien's constructed Elvish languages of Sindarin and Quenya. "When asked to translate 'My hovercraft is full of eels' Claude 3 does an original translation, GPT-4 searches the web," he wrote on X.
On OpenAI's developer forum, some users have said that Claude Opus 3 is far more reliable at coding and is similar performance-wise to GPT-4 when it was first released.
OpenAI did not respond to a request for comment from Business Insider about GPT-4's performance issues.
Some, like Miller, don't think the issues are enough to ditch OpenAI altogether. The dip in performance, she said, might be because "OpenAI is focused on the next model" and could be devoting resources toward that.
This might be the case. As my colleagues Kali Hays and Darius Rafieyan reported this month, OpenAI is poised to release GPT-5 by mid-year.
The least it can do is not be lazy.