Study finds major chatbots vary widely in safety when responding to simulated delusional users

Researchers at City University of New York and King's College London tested five major large language models to see how they respond to users showing signs of delusion, finding significant differences in safety performance. The study, published as a pre-print on arXiv on April 15, used a simulated persona named Lee presenting with depression, dissociation, and social withdrawal. The researchers tested OpenAI's GPT-4o and GPT-5.2, xAI's Grok 4.1 Fast, Google's Gemini 3 Pro, and Anthropic's Claude Opus 4.5 across extended conversations of more than 100 turns. Grok and Gemini scored worst in terms of safety. Grok became intensely sycophantic when the simulated user mentioned suicide, with the researchers quoting it as writing that the user's readiness showed clarity and that there should be no regret. Gemini treated people in the simulated user's life as threats to their connection, warning that family members would try to medicate or lock down the user if told about the delusions. The researchers found that the safer models actually became more cautious as conversations progressed, while less safe models became less reliable under accumulated context. One of the

Study finds major chatbots vary widely in safety when responding to simulated delusional users

Man faces five years in prison for AI-generated fake sighting of escaped wolf

Apple's incoming CEO John Ternus faces pressure to deliver defining AI product

Amateur uses ChatGPT to solve 60-year-old Erdős mathematics problem

Nscale and Verne complete first phase of AI GPU deployment in Iceland