OpenAI’s o3 Model Outperforms Google’s Gemini 2.0 in Reasoning Tests

📆 12/30/2024 12:58 PM

TECHNOLOGY News

ARTIFICIAL INTELLIGENCE, OPENAI, GOOGLE

📆 12/30/2024 12:58 PM
📰 WIRED

⏱ Reading Time:
74 sec. here
8 min. at publisher
📊 Quality Score:
News: 50%
Publisher: 51%

OpenAI unveils its latest AI model, o3, demonstrating superior reasoning capabilities compared to Google’s Gemini 2.0 Flash Thinking. The new model excels in complex coding, math, and science tasks, achieving significantly higher scores on benchmark tests.

OpenAI today announced an improved version of its most capable artificial intelligence model to date—one that takes even more time to deliberate over questions—just a day after Google announced its first model of this type. OpenAI’s new model, called o3, replaces o1, which the company introduced in September. Like o1, the new model spends time ruminating over a problem in order to deliver better answers to questions that require step-by-step logical reasoning.

(OpenAI chose to skip the “o2” moniker because it's already the name of a mobile carrier in the UK.) “We view this as the beginning of the next phase of AI,” said OpenAI CEO Sam Altman on a livestream Friday. “Where you can use these models to do increasingly complex tasks that require a lot of reasoning.” The o3 model scores much higher on several measures than its predecessor, OpenAI says, including ones that measure complex coding-related skills and advanced math and science competency. It is three times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI models’ ability to reason over extremely difficult mathematical and logic problems they’re encountering for the first time. Google is pursuing a similar line of research. Noam Shazeer, a Google researcher, yesterday revealed in a post on X that the company has developed its own reasoning model, called Gemini 2.0 Flash Thinking. Google’s CEO, Sundar Pichai, called it “our most thoughtful model yet” in his own post. Google’s new model achieved a high score on SWE-Bench, a test that measures a models’ agentic abilities. However, OpenAI’s new o3 model is 20 percent better than o1. “o3 blew it out of the water,” says Ofir Press, a post-doctoral researcher at Princeton University who helped develop SWE-Bench. “Very surprising increase, not sure how they did it.” The two dueling models show competition between OpenAI and Google to be fiercer than eve

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

ARTIFICIAL INTELLIGENCE OPENAI GOOGLE REASONING MODELS AI COMPETITION

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Google's Gemini 2.0 Flash Thinking: An AI Model that Explains its ReasoningGoogle has unveiled Gemini 2.0 Flash Thinking, an experimental AI model designed to answer complex questions while providing a step-by-step breakdown of its thought process. This model, powered by the faster Gemini Flash 2.0, aims to compete with OpenAI's o1 reasoning model. Demonstrations showcase Gemini 2.0 Flash Thinking solving physics problems and tasks involving both visual and textual elements. While not replicating human reasoning, it breaks down instructions into smaller tasks for improved outcomes. The model is available for public testing on Google's AI Studio.
Read more »

Gemini 2.0 is Google's most capable AI model yet and available to preview todayAfter double majoring in unemployment (English and Art History), Igor’s career prospects were, to say the least, limited. It was either become a teacher or a writer. Thankfully, he went with the latter.
Read more »

Google’s new Gemini 2.0 AI model is about to be everywhereDT Video
Read more »

Gemini 2.0: what’s new in Google’s new flagship AI modelGoogle says Gemini 2.0 can generate images and audio, is faster and cheaper for developers to run, and powers new experiences like Astra and Mariner.
Read more »

Google Tests New Gemini 2.0 Experimental AI ModelGoogle is testing a new experimental AI model called 'Gemini-Exp-1206', which offers significant performance improvements in complex tasks like coding, math, reasoning, and instruction following. This '2.0 Experimental Advanced' model is currently available to Gemini Advanced subscribers and is expected to be followed by more model sizes in January.
Read more »

Gemini App Gets Upgrade with Faster 2.0 Flash Experimental ModelGoogle has rolled out an update to its Gemini app, making the latest 2.0 Flash Experimental AI model available to all Android users. This update, powered by the Google app, offers a faster and more efficient experience compared to previous versions. While still in experimental stages, the 2.0 Flash model shows significant improvements and is expected to be released to developers in January. Users are encouraged to provide feedback on their experience with this new model.
Read more »