The AI-boom’s Unspoken Secret: Are the Reasoning Models Any Better?

I’ve been watching the hype around reasoning-capable AI, and it all seemed so polished – intelligent, methodical, reasonable. We’re told these models can break down tough tasks into logical steps, produce a chain‑of‑thought and edge us closer to real “thinking.” But recent research throws a damp squib on that image.

Apple’s "Illusion of Thinking"

In June, Apple’s lab published “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity.” arxiv apple They pitted so‑called reasoning models (LRMs) against clean, logic puzzles – like Tower of Hanoi, River Crossing and Blocks World – with programmable complexity.

Findings? These models perform well up to a point, but once complexity hits a threshold, their accuracy collapses to zero. They don’t ramp up; they give up. Some even stop using their token budget midway, essentially throwing in the towel. arize

Journalists at The Verge reported the same: “LRMs face a complete accuracy collapse beyond certain complexities.” theverge

Enterprise Echoes: Jagged Intelligence

Over at Salesforce, researchers have coined the term jagged intelligence to describe this erratic behaviour. salesforce Yes, reasoning models can shine on medium‑difficulty tasks, but their performance is inconsistent, with sharp drops in capability exactly where reliability matters in real‑world settings.

Enterprise General Intelligence remains aspirational. As Salesforce frames it, businesses need capability + consistency, not just impressive bursts of intelligence. itpro

Critics Say Not So Fast

Some say Apple’s results reflect limitations of their study, not the tech itself. Professor Seok Joon Kwon argued on Tom’s Hardware that Apple’s hardware infrastructure wasn’t up to scratch, missing the large‑scale, GPU‑heavy setups needed to fully unlock model reasoning. tomshardware

Meanwhile, on Hacker News, early adopters pointed out how flawed benchmarking, like using impossible puzzle configurations, could mislead conclusions. An “illusion”, yes, but maybe more in the experiment design than the models themselves.

Why This Matters

The enterprise danger: Brands are leaning on reasoning models to automate complex workflows – fraud detection, legal summarisation, decision support. If they can’t cope with real complexity, just imagine the compliance, ethical or financial fallout when a model “gives up” mid-task.
AGI dreams? If reasoning collapses at reasonable problem sizes, we’re a long way from Artificial General Intelligence. These are still pattern-matchers, not thinkers. We will need major breakthroughs and tech we still don’t have to get there.
Hype vs reality: Vendors market chain‑of‑thought outputs as proof of reasoning. But that might just be window dressing. As IT Pro’s podcast noted, they often “give up” when pushed far enough. itpro

A Balanced View

But let’s not throw the baby out with the bathwater. Vox reminds us that practical task completion matters more than philosophical “thinking.” AI is already automating entry‑level roles, regardless of whether it truly reasons. vox

Energy and efficiency gains from AI remain real. Yet a pill of realism must accompany the sugar of automation. Claiming full-scale chain‑of‑thought for every problem is misleading – and can hurt credibility.

What Should European Businesses Do?

Benchmark wisely. Build your own complexity tests. Don’t rely on vendor marketing. Use tools like Salesforce’s SIMPLE benchmark or labs like Apple’s to simulate real tasks.
Go hybrid. For critical tasks, combine reasoning models with human oversight, or add domain‑specific rule-based systems, or symbolic AI, to catch the fractures.
Demand transparency. Ask vendors how their systems behave under stress. Cue XAI and explainable AI to probe how and why decisions were made. wikipedia
Prepare for jaggedness. Model brittleness isn’t just a research quirk – it’s a risk. Plan fallback strategies if models fail mid-task.

Final Take

So, are reasoning models really thinking? The research suggests that while they have potential, they also have sharp limitations – they are not thinking, they’re just trained well. They’re prone to giving up once things get tough. That doesn’t negate their value – just reminds us we’re not deploying digital Sherlock Holmes. Not yet.

Our job? Strip away the hype, build robust systems with sensible oversight and ask the hard questions. Only then will we tap AI’s power without stepping off a cliff.

North Atlantic

Victor A. Lausas
Chief Executive Officer

Want to dive deeper?
Subscribe to North Atlantic’s email newsletter and get your free copy of my eBook,
Artificial Intelligence Made Unlocked. 👉 https://www.northatlantic.fi/contact/

Hungry for knowledge?
Discover Europe’s best free AI education platform, NORAI Connect, start learning AI or level up your skills with free AI courses and future-proof your AI knowledge. 👉 https://www.norai.fi/