OpenAI's big idea to increase the safety of its tech is to have AI models police each other

Lakshmi VaranasiJul 20, 2024, 20:51 IST

Business Insider

OpenAI is testing a host of safety techniques as part of a long-term initiative. NurPhoto/Getty

OpenAI is experimenting with a technique to enhance transparency with its AI models.
The method involves powerful AI models explaining their thought processes to a second AI.

OpenAI has a new technique for getting AI models to be more transparent about their thought processes: Getting them to talk to each other.

The company showcased the research behind the technique this week and will unveil more details in a forthcoming paper, according to Wired.

The gist is that putting two AI models in discussion with one another forces the more powerful one to be more open about its thinking. And that can help humans better understand how these models reason through problems.

OpenAI tested the technique by asking AI models to solve basic math problems. The more powerful one explained how it solved the problems, while the second one listened to detect errors in the former's answers.

The technique is one of several that OpenAI has released over the past few weeks that are "core to the mission of building an [artificial general intelligence] that is both safe and beneficial," Yining Chen, a researcher at OpenAI involved with the safety work, told Wired. The company also released a new scale to mark its progress toward artificial general intelligence.

The company's new initiative follows a few tumultuous months in its safety department. In May, OpenAI's cofounder and chief research officer, Ilya Sutskever, announced he was leaving, just six months after he spearheaded the failed ouster of CEO Sam Altman. Hours later, Jan Leike, another researcher at the company, followed suit. Leike and Sutskever co-led OpenAI's superalignment group — a team that focused on making artificial-intelligence systems align with human interests. A week later, OpenAI policy researcher Gretchen Krueger joined the ranks of departing employees, citing "overlapping concerns."

Their departures heightened concern about OpenAI's commitment to safety as it develops its technology. Last March, Tesla CEO Elon Musk was among multiple experts who signed a letter raising concerns about the rapid pace of AI development. More recently, AI expert and University of California Berkeley professor Stuart Russell said that OpenAI's ambitions to build artificial general intelligence without fully validating safety were "completely unacceptable."

OpenAI did not immediately respond to a request for comment from Business Insider.

Cookies on the Business Insider India website

OpenAI's big idea to increase the safety of its tech is to have AI models police each other