+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Elon Musk is putting his AI chips to work — and he's catching up with Mark Zuckerberg

Sep 3, 2024, 21:17 IST
Business Insider
Elon Musk said xAI has brought "the most powerful AI training system in the world" online. Marc Piasecki/Getty Images
  • Elon Musk just put a whole bunch of Nvidia chips to work.
  • He said on Monday his company, xAI, brought its AI training cluster, Colossus, online.
Advertisement

Elon Musk might be distracted right now by Brazil's Supreme Court over its decision to ban X, but he isn't letting that stop him from pushing forwards with his AI ambitions.

On Monday, the billionaire said xAI — the company he launched in July 2023 — had brought a massive new training cluster of chips online over the weekend, claiming it represented "the most powerful AI training system in the world."

The system, dubbed "Colossus," was built at a site in Memphis using 100,000 chips from Nvidia, specifically its H100 GPUs. According to Musk, the current cluster was built within 122 days and will "double in size" in a few months as more GPUs are added into the mix.

Though Musk previously confirmed the size of the cluster in July, bringing it online marks a key step forward for his AI ambitions and, critically, allows him to play catch-up with Silicon Valley nemesis Mark Zuckerberg.

Like the Meta chief, Musk's ambitions — to turn xAI into a company that advances "our collective understanding of the universe" with its Grok chatbot — depend on high-performance GPUs, which provide the computing power required for powerful AI models.

Advertisement

These haven't exactly been easy to come by, nor have they been cheap.

The hype generated around AI since the release of ChatGPT in late 2022 has left companies scrambling for Nvidia GPUs, with shortages stemming from frenzied demand and supply constraints. In some instances, they have been sold for upward of $40,000.

That said, these barriers to access haven't stopped companies from securing a supply of GPUs in any way they can and putting them to work to edge ahead of rivals.

Llama vs Grok

Nathan Benaich, the founder and a general partner at Air Street Capital, has been tracking the number of H100 GPUs acquired by tech companies. He puts Meta's total at 350,000 and xAI's at 100,000. Tesla, one of Musk's other companies, has 35,000.

Earlier this year, Zuckerberg said that Meta would have a massive stockpile of 600,000 GPUs by the end of the year, with some 350,000 of those GPUs being Nvidia's H100s.

Advertisement

Others, like Microsoft, OpenAI, and Amazon, haven't disclosed the size of their H100 pile.

Meta hasn't disclosed exactly how many GPUs Zuckerberg has secured from his 600,000 target and how many have been put to use. However, in a research paper published in July, Meta noted that the largest version of its Llama 3 large language model had been trained on 16,000 H100 GPUs. In March, the company also announced "a major investment in Meta's AI future" with two 24,000 GPU clusters to support the development of Llama 3.

It suggests that xAI's latest training cluster, with its 100,000 H100 GPUs, is much bigger than the cluster used to train Meta's largest AI model, as of July.

The scale of the feat hasn't been lost on the industry.

On X, a post from Nvidia's data center account in response to Musk read: "Exciting to see Colossus, the world's largest GPU #supercomputer, come online in record time."

Advertisement

xAI cofounder Greg Yang, meanwhile, had a more colorful response to the news that riffed on a song by American rapper Tyga:

Shaun Maguire, partner at venture capital firm Sequoia, wrote on X that the xAI team now "has access to the world's most powerful training cluster" to build the next version of its Grok chatbot. He added: "In the last few weeks Grok-2 catapulted to being roughly at parity with the state-of-the-art models."

But, as with most AI companies, there are big question marks over commercializing the technology. "It's impressive xAI has been able to raise so much with Elon and make progress, but their product strategy remains unclear," Benaich told Business Insider.

Back in July, Musk said the next version of Grok — after training on 100,000 H100s — "should be really something special."

We'll find out soon enough how competitive it makes him with Zuckerberg on AI.

Advertisement
You are subscribed to notifications!
Looks like you've blocked notifications!
Next Article