Stephanie Pappas is a contributing writer for Live Science, covering topics ranging from geoscience to archaeology to the human brain and behavior.
Mathematicians have stumped the most advanced generative artificial intelligence models with a series of mind-bending new math problems.
For example, in the commonly used Measuring Massive Multitask Language Understanding benchmark test, today's AI models answer 98% of math problems correctly. The new set of benchmarks, called FrontierMath, aims for a higher level of reasoning. Epoch AI developed the questions with the help of mathematics professors, including some winners of the Fields Medal, perhaps the most prestigious prize in math. The problems cover a wide range of subfields, from number theory to algebraic geometry, and are available on Epoch AI's website.
The problems were also unique — a step taken to ensure that none of the problems were already in the AI models' training data. When complex reasoning problems are included in the training data, the AI may appear to solve the problems, but in reality, it already has a"cheat sheet," since it has been trained on the answers.
RELATED STORIES—Claude 3 Opus has stunned AI researchers with its intellect and 'self-awareness' — does this mean it can think for itself?—'Student of Games' is the 1st AI that can master different types of games, like chess and poker
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Mathematicians Discover Infinite Monkey Theorem is 'Misleading''Non-trivial text generation during the lifespan of our universe is almost certainly impossible,' the researchers said.
Read more »
Those typing monkeys will never produce Shakespeare’s works, mathematicians sayMonkeys will not be able to type the complete works of William Shakespeare, or even a short book, before the death of the universe, a new study suggests.
Read more »
Could a monkey randomly type out Shakespeare’s ‘Hamlet’ over time? Mathematicians say they have the answerAustralian mathematicians have put the “infinite monkey theorem” to the test.
Read more »
Astronauts to grow livers in space, where microgravity might help them thriveStephanie Pappas is a contributing writer for Live Science, covering topics ranging from geoscience to archaeology to the human brain and behavior.
Read more »
Earth's mantle is split into two halves thanks to supercontinent PangaeaStephanie Pappas is a contributing writer for Live Science, covering topics ranging from geoscience to archaeology to the human brain and behavior.
Read more »
Father-daughter team decodes 'alien signal' from Mars that stumped the world for a yearStephanie Pappas is a contributing writer for Live Science, covering topics ranging from geoscience to archaeology to the human brain and behavior.
Read more »