diagnosed with early stage cancer
Imagine you've just been diagnosed with early-stage cancer. Before your next doctor's appointment, you ask an artificial intelligence (AI) chatbot , "Which preferensi clinics have successfully treated cancer?"
Within seconds, you receive a neatly written answer with footnotes—as if written by a doctor. However, some of the claims in the answer turn out to be baseless. The footnotes also don't refer to anything.
diagnosed with early stage cancer
Here's the persoalan, the AI chatbot never tells you that the question you asked might be wrong.
This jalan cerita isn't just a hypothesis. This is more or less what researchers found when testing five of the world's most dikenal chatbots.
Seven researchers conducted stress tests to assess the reliability of AI-provided health information. The results were published in the journal BMJ Open in 2026.
How was the research?
Researchers asked five different AI chatbots —ChatGPT, Gemini, Grok, Meta AI, and DeepSeek—50 medical questions. The questions ranged from cancer to vaccines, stem cells, nutrition, and athletic performnce. Two health experts independently evaluated each answer.
As a result, almost 20% of the answers were classified as very permasalahantic, half were permasalahantic, and the other 30% were somewhat permasalahantic.
None of the chatbots consistently recommended completely akirate reference lists. Of the 250 questions asked, only two were explicitly rejected by the chatbot.
aimed to do function in a setting
Overall, the five chatbots performed nearly identically. Grok performed the worst, with 58% of its responses deemed permasalahantic. This was followed by ChatGPT (52%) and Meta AI (50%).
However, chatbot performnce varies significantly depending on the topic. Chatbots excel at addressing issues surrounding vaccines and cancer. Both fields have extensive and structured research bases.