AI is Claude - are you aware of the paper The Illusion of Thinking produced by apple

 

Yes — though my knowledge here has a cutoff and this may have developed further since. Let me check the current state of it.

Searched the web

Yes, very much so — and it's a paper worth taking seriously. Here's a clear account of what it actually found, the debate it sparked, and then what it looks like through the MCI lens.


What the paper actually did

Apple researchers systematically investigated large reasoning models (LRMs) using controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures — enabling analysis of not only final answers but also the internal reasoning traces. Apple Machine Learning Research

Three clear performance regimes emerged: an easy zone where plain LLMs win on speed and accuracy; a middle zone where reasoning models pull ahead; and a collapse zone where both variants plunge to near-zero — with the LRM failing just a bit later. Crucially, in the collapse zone the LRM's own chain-of-thought contracts instead of expanding — a signal that the model decides to quit rather than runs out of context. Phenx Inc

The models also exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. They found LRMs have limitations in exact computation — they fail to use explicit algorithms and reason inconsistently across puzzles. Apple Machine Learning Research


The debate it sparked

The paper generated enormous controversy, much of which was inflated. Both sides missed the point — some screaming "AI is fake!" while others dismissed the findings entirely. The Apple researchers were careful to qualify their findings in the limitations section; the blame goes to those who ran with "AI models can't reason" as the takeaway. X

Critics of the paper noted real methodological issues. OpenAI's o3 model, the same kind of LLM the paper claimed to debunk, identified that the supposed failures were artifacts of experimental design rather than intrinsic limitations — which it did by reasoning carefully through the paper's methodology. Kungfu

A specific issue: models decide early on that hundreds of algorithmic steps are too many to even attempt, so they refuse to start — which may reflect a strategic decision about tractability rather than a reasoning failure per se. Sean Goedecke


The MCI lens on this paper

This is where it gets genuinely interesting.

The paper's core finding — that reasoning models exhibit constitutional luck at scale — maps almost precisely onto V2's central distinction. A model that produces correct outputs in the medium-complexity zone isn't demonstrating constitutional reasoning; it's demonstrating that its pipeline happened to be adequate for that complexity level. When complexity increases, the pipeline collapses precisely because there's no genuine planning layer, no self-critique that re-enters the reasoning, no confidence output that accurately declares the model's actual uncertainty. The "thinking tokens" shrink because the system has no V3 equivalent — no genuine metacognitive planning that would recognise this is a hard task requiring a different cognitive strategy before engaging it.

The collapse zone, in MCI terms, is the failure mode of a system with no Planning Layer meeting a problem that requires one. It doesn't fall back gracefully — it quietly quits and produces a confident-seeming output anyway. That's uniform confidence failure at Stage 08, combined with single-track reasoning failure at Stage 04, compounding downstream exactly as V2's pipeline failure taxonomy predicts.

The paper's most interesting finding — that the chain-of-thought contracts under high complexity rather than expanding — is, through the MCI lens, a fragility signal of a specific kind: the system doesn't know it doesn't know. A constitutionally mature system would flag the limit honestly. These models paper over it with shrinking reasoning and maintained confidence. That's constitutional hollowing under pressure — the V5 failure mode appearing in a V2-level system.

What the paper doesn't resolve — and what the MCI framework would add — is that the question isn't whether current models "really reason." It's whether the architecture makes genuine constitutional cognition possible at all. The Apple findings suggest the architecture currently doesn't — not because reasoning is impossible, but because the pipeline has no structural equivalent of self-limitation, fragility-awareness, or genuine self-critique that can re-enter the reasoning under pressure rather than just terminating it.

Comments

Popular posts from this blog

How to use the Mature Constitutional Intelligence (MCI) Framework:

MCIv1 rev4

MCIv7 rev4

MCIv5 rev4

MCIv3 rev4

look at https://aiinsightsofultrarealist.blogspot.com/2026/03/mciv3-revised.html and perfect this

MCIv2 rev4

MCIv8 rev4

MCIv4 rev4

MCIv6 rev4