ChatGPT health frequently underestimates severity of medical emergencies

NBC News (3/3, Ozcan) reports a study found that OpenAI’s ChatGPT Health “frequently underestimated the severity of medical emergencies.” For the study, “researchers tested ChatGPT Health’s ability to triage, or assess the severity of, medical cases based on real-life scenarios.” The researchers “fed 60 medical scenarios to ChatGPT Health. The chatbot’s responses were compared with the responses of three physicians who also reviewed the scenarios and triaged each one based on medical guidelines and clinical expertise.” They observed that “ChatGPT Health ‘under-triaged’ 51.6% of emergency cases. That is, instead of recommending the patient go to the emergency room, the bot recommended seeing a doctor within 24 to 48 hours.” In addition, “compared with the doctors in the study, the bot also over-triaged 64.8% of nonurgent cases, recommending a doctor’s appointment when it wasn’t necessary.” The study was published in Nature Medicine.

Related Links:

— “ChatGPT Health ‘under-triaged’ half of medical emergencies in a new study,”Kaan Ozcan, NBC News, March 3, 2026