How does this compare to similar events?

Compared to traditional benchmarks like ImageNet, SOOHAK differs due to its unsolvable task inclusion, testing cognitive reflection.

What outcome is predicted from this development?

Based on current trends, expect a follow-up benchmark including advanced metacognitive testing by Q4 2026.

Research·Europe

Google's Gemini 3 Pro Tops New AI Math Benchmark

Global AI Watch · Editorial Team·17 May 2026·4 min read

Perspectiva editorial

By incorporating unsolvable tasks, the SOOHAK test pushes AI towards self-awareness, reshaping future benchmarks.

What Changed

The introduction of the SOOHAK benchmark marks a noteworthy moment in AI evaluation, incorporating 439 tasks with 99 purposefully unsolvable problems. This positions it as the first to interlink hand-written and unsolvable tasks in a single framework. Google's Gemini 3 Pro stands out, achieving a 30% success rate on the research level, which positions it as a leader in this nascent area of AI assessment. Such benchmarks are critical in this space as they stress-test AI's abilities in complex problem-solving beyond mere computational prowess.

Strategic Implications

Google's leadership with the Gemini 3 Pro could augment its influence in AI research, particularly in advanced problem-solving capabilities. The persistence of models in confidently attempting unsolvable tasks underscores an ongoing challenge in AI systems' self-awareness and decision-making reliability. These shortcomings suggest a critical need for advancing AI's meta-cognitive abilities. This benchmark thus positions Google to potentially influence future benchmark designs by emphasizing cognitive reflection in AI systems.

What Happens Next

Expect key players in AI, such as OpenAI and Meta, to further develop AI models that can provide more nuanced responses to unsolvable tasks. By Q4 2026, new benchmarks might emerge focusing on AI's metacognitive judgment. As AI systems strive towards higher accuracy, the development of self-reflective capabilities could become more mainstream, impacting the design priorities of AI models across research and commercial applications.

Second-Order Effects

Improvements in metacognitive AI might affect sectors reliant on complex decision-making models, such as finance and autonomous systems. By revealing limitations in current AI evaluations, there could be increased pressure to diversify AI benchmark challenges, which might trigger shifts in academic focuses and funding allocations, enhancing synergies between AI and cognitive science research.

Free Daily Briefing

Top AI intelligence stories delivered each morning.

Subscribe Free →

What Changed

Strategic Implications

What Happens Next

Second-Order Effects

Explore Trackers