Older chatbots , just like older people, show signs of cognitive decline, a new study finds.
People are increasingly relying on artificial intelligence for medical diagnoses because of how quickly and effectively these tools can spot abnormalities and warning signs in medical histories, X-rays and other datasets before they become apparent to the naked eye. But a new study published in the journal BMJ raises concerns that AI technologies, such as large language models (LLMs) and chatbots, are showing signs of cognitive decline over time, just like humans.
“These findings challenge the assumption that artificial intelligence will soon replace human doctors,” the study authors write, “as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnosis and undermine patient trust.”
As LiveScience reports
, the scientists tested publicly available LLM-based chatbots, including OpenAI’s
ChatGPT , Anthropic’s Sonnet, and Alphabet’s Gemini, using the Montreal Cognitive Assessment (MoCA), a test that neuroscientists use to test abilities in attention, memory, language, spatial skills, and executive cognitive function.
Cognitive assessment in chatbots
The MoCA is commonly used to assess cognitive impairment in conditions such as Alzheimer’s disease or dementia. Subjects undertake tasks such as drawing a specific time on a blank clock, starting at 100 and repeatedly subtracting seven, remembering as many words as possible from a verbal list, and so on. In humans, a score of 26 or higher out of 30 is considered successful (i.e., the subject does not exhibit cognitive impairment).
While some aspects of the tests, such as naming, attention, language, and abstraction, were seemingly easy for most of the LLMs used, all performed poorly on visual/spatial skills and executive tasks, while several did worse than others in areas such as delayed recall.
While the most recent version of ChatGPT (version 4) achieved the highest score (26 out of 30), the older LLM Gemini 1.0 achieved only 16 – leading to the conclusion that older LLMs show signs of cognitive decline.
The study authors note that their findings are only observational—the crucial differences between the ways in which AI and the human mind work mean that the experiment cannot be a direct comparison. But they warn that it may point to a “significant weakness” that could hold back the development of AI in clinical medicine. In particular, they cautioned against its use in tasks that require visual abstraction and executive function.