Google's Gemini 3 Pro Tops New AI Math Benchmark

By incorporating unsolvable tasks, the SOOHAK test pushes AI towards self-awareness, reshaping future benchmarks.
What Changed
The introduction of the SOOHAK benchmark marks a noteworthy moment in AI evaluation, incorporating 439 tasks with 99 purposefully unsolvable problems. This positions it as the first to interlink hand-written and unsolvable tasks in a single framework. Google's Gemini 3 Pro stands out, achieving a 30% success rate on the research level, which positions it as a leader in this nascent area of AI assessment. Such benchmarks are critical in this space as they stress-test AI's abilities in complex problem-solving beyond mere computational prowess.
Strategic Implications
Google's leadership with the Gemini 3 Pro could augment its influence in AI research, particularly in advanced problem-solving capabilities. The persistence of models in confidently attempting unsolvable tasks underscores an ongoing challenge in AI systems' self-awareness and decision-making reliability. These shortcomings suggest a critical need for advancing AI's meta-cognitive abilities. This benchmark thus positions Google to potentially influence future benchmark designs by emphasizing cognitive reflection in AI systems.
What Happens Next
Expect key players in AI, such as OpenAI and Meta, to further develop AI models that can provide more nuanced responses to unsolvable tasks. By Q4 2026, new benchmarks might emerge focusing on AI's metacognitive judgment. As AI systems strive towards higher accuracy, the development of self-reflective capabilities could become more mainstream, impacting the design priorities of AI models across research and commercial applications.
Second-Order Effects
Improvements in metacognitive AI might affect sectors reliant on complex decision-making models, such as finance and autonomous systems. By revealing limitations in current AI evaluations, there could be increased pressure to diversify AI benchmark challenges, which might trigger shifts in academic focuses and funding allocations, enhancing synergies between AI and cognitive science research.
Free Daily Briefing
Top AI intelligence stories delivered each morning.