AI models from OpenAI and other tech giants are being bombarded by a new swarm of bots 'extracting intelligence'

Alistair BarrSep 21, 2023, 22:18 IST

Business Insider

Vercel CEO Guillermo RauchVivan Cromwell/Zeit

Vercel CEO Guillermo Rauch spotted a new breed of bot recently.
These bots scrape information from AI models, including OpenAI's GPT-4.

Powerful AI models, such as OpenAI's GPT-4, are being bombarded by digital bots that are "extracting intelligence" in new and nefarious ways.

The phenomenon was spotted recently by Guillermo Rauch, CEO of Vercel, a startup that helps developers build websites that integrate with many of the biggest AI models.

He discussed this new breed of bot on the No Priors podcast with venture capitalists Elad Gil and Sarah Guo.

"It's almost like, extracting intelligence," Rauch said. "Let's call it web scraper 2.0. I run a bot that tries to get free GPT-4 basically."

It's a huge problem, he added, so I called him up to delve deeper.

'Threat of model distillation'

The generative AI boom has sparked unprecedented demand for quality data. AI models need this content for training. Without it, the technology just isn't as good. And there's not enough to go round.

Rauch says this is one driver of these new bots. If you can cleverly scrape the outputs of GPT-4, Llama 2 and other powerful AI models, then you could use that as fresh training data for your own model, he explained.

"There's a threat of model distillation," he said. "AI models can, in theory, share everything they know. It's plausible that you can train another model based on 100,000 high quality outputs from GPT-4, for example."

Indeed, several of the top AI companies, including OpenAI, Google and Anthropic, ban the use of their outputs for training other models.

A surprise $35,000 OpenAI bill

Another reason: It's increasingly expensive to use the top-performing models. OpenAI and other tech companies have rate limits where even paying users can only ask a limited number of questions per minute or per day.

Instead of abiding by these rules, bad actors are creating bots that bombard models with questions and leave someone else paying the bill for all the answers. This is often done by infiltrating applications that have official accounts and API connections with the largest and most powerful AI models, Rauch explained.

"A lot of people are writing bots that try to go after web applications that rely on AI," he said. "These are essentially proxies to pull out this information, sometimes on behalf of users who are not paying to access the models."

One developer Rauch knows was a victim of this type of attack. She has an application for data scientists that queries a major large language model. Bots attacked and essentially used her app as a proxy to access the AI model.

"She ran up a $35,000 OpenAI bill in a very short time," Rauch said. "She spent months trying to explain that this wasn't her usage. Eventually OpenAI refunded her." OpenAI didn't respond to a request for comment.

Evading China's AI model blockade

A third reason for this new phenomenon: China blocked access to ChatGPT, GPT-4 and many of the other top generative AI models. Creating a bot that secretly collects all the best outputs is one way to get around that country's censorship, Rauch explained.

Hundreds of thousands of AI applications are being deployed on Vercel's platform each month right now. So there are a lot of targets for these new bots.

Vercel offers technology to help developers protect against these attacks.

SaaS businesses at risk

Rauch also sees SaaS businesses being challenged by this phenomenon. These types of companies often sell per-seat subscriptions that cost maybe $5 or $10 a month for unlimited use.

New AI versions of SaaS services that query large AI models could be attacked by bots and end up paying for outputs that their real customers aren't getting, he explained.

"Your SaaS business might find itself upside down and losing money," Rauch said. "So there will be more usage-based charging. A platform fee, a seat and a per-token charge or a per-query charge."

Vercel already integrates rate limits for developers, he noted. So an an app could offer a seat where the user can only query AI models a certain number of times per day.

"That stops attacks by outside bots that will do massive numbers of requests to steal intelligence," Rauch said.

Cookies on the Business Insider India website

AI models from OpenAI and other tech giants are being bombarded by a new swarm of bots 'extracting intelligence'

'Threat of model distillation'

A surprise $35,000 OpenAI bill

Evading China's AI model blockade

SaaS businesses at risk