The Risks of Generative AI
Generative AI: A Risky Proposition for Healthcare
Generative AI, while powerful, has shown significant shortcomings in healthcare applications. A 2024 study by the University of Massachusetts Amherst and Mendel uncovered alarming rates of hallucinations in medical summaries generated by frontier AI models.
Key Findings:
- Nearly all summaries analyzed contained hallucinations.
- Inconsistencies included:
- 327 medical event errors in GPT-4o summaries.
- 271 medical event errors in Llama-3 summaries.
- Incorrect reasoning and chronological inconsistencies were common.
- The most frequent hallucinations were related to symptoms, diagnoses, and medicinal instructions, posing serious risks such as misdiagnosis or incorrect treatment.
![The image compares the medical errors between GPT-4o and Llama-3. On the left, GPT-4o is displayed with 327 Medical Errors in large, bold white text, and on the right, Llama-3 is shown with 271 Medical Errors in similar bold white text. Below, there is a note stating: "Hallucinations in NEARLY ALL summaries", with "NEARLY ALL" highlighted in orange. To the left of the note, there is an icon of a speech bubble containing a jumbled line, symbolizing a hallucination or nonsensical output. The background is dark gray, with the text in white and orange for emphasis. This visual highlights the high frequency of hallucinations and errors in medical summaries generated by both models.](https://cdn.prod.website-files.com/662a97b4ad2f847215b129a9/675b4cb1b6061e62ba241c7b_image_2024-12-12_155057175.png)