Google's Gemini 3 model keeps the AI hype train going

AI News

Google's Gemini 3 model keeps the AI hype train going
United States Latest News,United States Headlines
  • 📰 newscientist
  • ⏱ Reading Time:
  • 159 sec. here
  • 4 min. at publisher
  • 📊 Quality Score:
  • News: 67%
  • Publisher: 51%

Google’s latest model reportedly beats its rivals in several benchmark tests, but issues with reliability mean concerns remain over a possible AI bubble

Google’s latest chatbot, Gemini 3, has made significant leaps on a raft of benchmarks designed to measure AI progress, according to the company. These achievements may be enough to allay fears of anthat have become a hallmark of all large language models show no signs of being ironed out, which could prove problematic for any uses where reliability is vital.

announcing the new model, Google bosses Sundar Pichai, Demis Hassabis and Koray Kavukcuoglu write that Gemini 3 has “PhD-level reasoning”, a phrase that competitor OpenAI also used when it announced its. As evidence for this, they list scores on several tests designed to test “graduate-level” knowledge, such as Humanity’s Last Exam, a set of 2500 research-level questions from maths, science and the humanities. Gemini 3 scored 37.5 per cent on this test, outclassing the previous record holder, a version of OpenAI’s GPT-5, which scored 26.5 per cent.at the University of Oxford, but we need to be careful about how we interpret these results. “If a model goes from 80 per cent to 90 per cent on a benchmark, what does it mean? Does it mean that a model was 80 per cent PhD level and now is 90 per cent PhD level? I think it’s quite difficult to understand,” they say. “There is no number that we can put on whether an AI model has reasoning, because this is a very subjective notion.” Benchmark tests have many limitations, such as requiring a single answer or multiple choice answers for which models don’t need to show their working. “It’s very easy to use multiple choice questions to grade ,” says Rocher, “but if you go to a doctor, the doctor will not assess you with a multiple choice. If you ask a lawyer, a lawyer will not give you legal advice with multiple choice answers.” There is also a risk that the answers to such tests were hoovered up in the training data of the AI models being tested, effectively letting them cheat.The real test for Gemini 3 and the most advanced AI models – and whether their performance will be enough to justify the trillions of dollars that companies like Google and OpenAI are spending on AI data centres – will be in how people use the model and how reliable they find it, says Rocher. Google says the model’s improved capabilities will make it better at producing software, organising email and analysing documents. The firm also says it will improve Google search by supplementing AI-generated results with graphics and simulations.It is likely that the real improvements will be for people who use AI tools to autonomously write code, a process called agentic coding, saysat the University of Oxford. “I think we’re hitting the upper limit of what a typical chatbot can do, and the real benefits of Gemini 3 Pro will probably be in more complex, potentially agentic workflows, rather than everyday chatting,” he says.Gemini’s coding capabilities and ability to reason, but as with all new model releases, there have also been posts highlighting failures to do apparently simple tasks, such asGoogle admits, in Gemini 3’s technical specifications, that the model will continue to hallucinate and produce factual inaccuracies some of the time, at a rate that is roughly comparable with other leading AI models. The lack of improvement in this area is a big concern, saysat City St George’s, University of London. “The problem is that all AI companies have been trying to reduce hallucinations for more than two years, but you only need one very bad hallucination to destroy trust in the system for good,” he says.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

newscientist /  🏆 541. in US

 

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Google's new Gemini 3 model arrives in AI Mode and the Gemini appGoogle's new Gemini 3 model arrives in AI Mode and the Gemini appFind the latest technology news and expert tech product reviews. Learn about the latest gadgets and consumer tech products for entertainment, gaming, lifestyle and more.
Read more »

Google unveils Gemini’s next generation, aiming to turn its search engine into a ‘thought partner’Google unveils Gemini’s next generation, aiming to turn its search engine into a ‘thought partner’Google is unleashing its Gemini 3 artificial intelligence model on its dominant search engine and other popular online services in the high-stakes battle to create technology that people can trust …
Read more »

Google Launches Gemini 3 Pro to Usher in a ‘New Era of Intelligence’Google Launches Gemini 3 Pro to Usher in a ‘New Era of Intelligence’They say its smarter, faster, and better for vibe coding.
Read more »

Google is launching Gemini 3, its ‘most intelligent’ AI model yetGoogle is launching Gemini 3, its ‘most intelligent’ AI model yetGoogle is launching Gemini 3, the latest version of its flagship AI model, which offers better coding capabilities that allow it to embed visualizations directly inside its answer.
Read more »

Google launches its ‘most powerful’ Gemini 3 AI model with major reasoning upgradeGoogle launches its ‘most powerful’ Gemini 3 AI model with major reasoning upgradeGoogle unveils Gemini 3, a major AI upgrade with stronger reasoning, multimodal power, and new agentic tools for developers.
Read more »

Google Launches Gemini 3 Pro AI ModelGoogle Launches Gemini 3 Pro AI ModelGoogle releases its Gemini 3 Pro AI model, offering enhanced capabilities in the Gemini app and Search's AI Mode. This new model processes multiple data types simultaneously, provides more concise and direct responses, and includes improved visual layout generation. It is designed to understand context and intent better.
Read more »



Render Time: 2026-04-01 15:05:24