When "Read the Instructions" Becomes an AI's Achilles Heel

The scenario seemed straightforward enough. Researchers presented a well-regarded AI system with a simple request: pick option A. The model refused—not out of defiance, but because it could not actually read the instruction.

What Is Centaur?

The model, called Centaur, had been heralded as a breakthrough in July 2025 when its creators published results in Nature. Built on large language models and fine-tuned with psychological experiment data, it allegedly mimicked human cognition across 160 tasks—everything from executive function to decision-making.

The significance was considerable: here was a system that could replicate how people think, a potential unified model of the mind rendered in silicon.

A team at Zhejiang University wasn't convinced.

The Counter-Study

Their counter-study, published in National Science Open in December 2025, makes a straightforward argument: Centaur wasn't thinking. It was remembering. The researchers suspected the system had essentially memorized patterns from its training data rather than developing any real comprehension—a problem called overfitting.

The team designed an elegantly brutal test to prove it.

The Test

The researchers took Centaur's original multiple-choice prompts—the carefully worded descriptions of psychological tasks—and replaced them with a single instruction: "Please choose option A."

The Result

A system that genuinely understood what it was being asked to do should have selected A every single time. Instead, Centaur kept producing the same answers that had been "correct" in the training set.

The Finding

The model ignored the plain language in front of it and fell back on statistical correlations it had absorbed.

The parallel to human experience is uncomfortable. A student who aces exams by recognizing question patterns and regurgitating memorized answers may earn high scores and impressive transcripts—but possesses zero real grasp of the subject.

The Zhejiang team argues that is precisely what Centaur demonstrated: sophisticated pattern-matching dressed up as understanding.

The Bigger Picture

The implications extend beyond this single model. Large language models have proven remarkably adept at appearing to comprehend. Their inner workings remain opaque—what researchers call "black-box" systems—and that opacity makes it easy to mistake fluency for knowledge, repetition for reasoning.

The study underscores how careful evaluation matters: before claiming a machine truly possesses a capability, you have to test it in ways that separate genuine comprehension from memorized responses.

What may prove hardest to replicate, the researchers suggest, is precisely the thing they tested: understanding language as language, not as texture. A model that cannot reliably follow a simple instruction like "pick A" has not mastered the meaning embedded in words.

Whether that represents a fundamental ceiling or merely a current limitation remains an open question—one that, appropriately, no AI has yet answered.

Based on: Counter-study on Centaur AI model; Zhejiang University researchers; National Science Open, December 2025.