Apple's new AI aims to take on GPT-4 with its ability to understand context clues
- Apple researchers developed a new AI system to "see" and interpret context from on-screen content.
- The "Reference Resolution As Language Modeling" system allows for more natural interactions with AI.
Apple's new development in AI aims to take on OpenAI's GPT products and may make your interactions with virtual assistants like Siri more intuitive.
The ReaLM system, which stands for "Reference Resolution As Language Modeling," understands ambiguous on-screen images and content and conversational context to enable more natural interactions with AI.
The new Apple system outperforms other large language models like GPT-4 when determining context and what linguistic expressions refer to, according to the researchers who created it. And, as a less complex system than other Large Language Models like OpenAI's GPT series, researchers called ReaLM "an ideal choice" for a context-deciphering system "that can exist on-device without compromising on performance."
For example, let's say you ask Siri to show you a list of local pharmacies. Upon being presented with the list, you might ask it to "Call the one on Rainbow Road" or "Call the bottom one." With ReaLM, instead of getting an error message asking for more information, Siri could decipher the context needed to follow through with such a task better than GPT-4 can, according to the Apple researchers who created the system.
"Human speech typically contains ambiguous references such as 'they' or 'that,' whose meaning is obvious (to other humans) given the context," the researchers wrote about ReaLM's abilities. "Being able to understand context, including references like these, is essential for a conversational assistant that aims to allow a user to naturally communicate their requirements to an agent, or to have a conversation with it."
The ReaLM system can interpret images embedded in text, which the researchers say can be used to extract information like phone numbers or recipes from on-page images.
OpenAI's GPT-3.5 accepts only text input, and GPT-4, which can also contextualize images, is a large system trained mostly on natural, real-world images, not screenshots — which the Apple researchers say hinders its practical performance and makes ReaLM the better option for understanding on-screen information.
"Apple has long been seen as a laggard to Microsoft, Google, and Amazon in developing conversational AI," The Information reported. "The iPhone maker has a reputation for being a careful, deliberate developer of new products — a tactic that's worked well to gain the trust of consumers but may come to hurt it in the fast-paced AI race."
But with the teasing of ReaLM's capabilities, it appears Apple may be getting ready to enter the race in earnest.
The researchers behind ReaLM and representatives for OpenAI did not immediately respond to requests for comment from Business Insider.
It remains unclear when or whether ReaLM will be implemented into Siri or other Apple products, but CEO Tim Cook said during a recent earnings call that the company is "excited to share details of our ongoing work in AI later this year."