News

Article

Research Suggests AI Could Enhance Patient Safety, but Raises Questions

Author(s):

Popular artificial intelligence (AI) model GPT-4 answered 88% of questions correctly on a standardized patient safety exam.

A new study from Boston University highlighted the potential of generative artificial intelligence (AI) to improve patient safety in health care.

Published in The Joint Commission Journal on Quality and Patient Safety, the study tested the widely used AI model GPT-4 on the Certified Professional in Patient Safety (CPPS) exam, where it answered 88% of questions correctly.1 Researchers believe AI could help reduce medical errors, estimated to cause 400,000 deaths annually, by assisting clinicians in identifying and addressing safety risks in hospitals and clinics.

The study marks the first in-depth test of GPT-4’s capabilities in patient safety, focusing on its performance in key areas, including risk solutions, measuring performance, and systems thinking. GPT-4 excelled in areas such as patient safety and solutions, but showed weaknesses in culture and leadership domains, especially when multiple correct answers were possible.

The study authors suggested that AI has promise in helping doctors better recognize, address, and prevent mistakes or accidental harm in hospitals and clinics.

AI chatbot | Image credit: LALAKA – stock.adobe.com

GPT-4 answered 88% of questions correctly on the CPPS exam | Image credit: LALAKA – stock.adobe.com

“While more research is needed to fully understand what current AI can do in patient safety, this study shows that AI has some potential to improve healthcare by assisting clinicians in addressing preventable harms,” said Nicholas Cordella, MD, MSc, assistant professor of medicine at Boston University Chobanian & Avedisian School of Medicine, medical director for quality and patient safety at Boston Medical Center.2

However, the study also highlighted critical limitations of current AI technologies, including the risk of bias, fabricated data, and false confidence in responses.1 Additionally, the exact passing score for the CPPS exam is not disclosed, but the researchers believe GPT-4's score aligned with the performance of skilled human patient safety practitioners. Notably, GPT-4 displayed high confidence across all questions, even when it provided incorrect answers, showing "high" certainty for 5 of the 6 questions it answered incorrectly.

"Our findings suggest that AI has the potential to significantly enhance patient safety, marking an enabling step towards leveraging this technology to reduce preventable harms and achieve better healthcare outcomes,” said Cordella.2 “However, it's important to recognize this as an initial step, and we must rigorously test and refine AI applications to truly benefit patient care.”

Integrating AI into patient care is a growing topic of discussion. In a separate study presented at the European Respiratory Society Congress, researchers found that ChatGPT outperformed trainee doctors in assessing pediatric respiratory diseases, such as cystic fibrosis and asthma.3 Trainee doctors and 3 large language models—ChatGPT version 3.5, Google’s Bard, and Microsoft Bing’s chatbot—gave responses to scenarios that were scored out of 9 based on their correctness, comprehensiveness, usefulness, plausibility, and coherence. Trainee doctors scored the same as Bing at 4, Bard scored higher at 6, and ChatGPT scored the best with 7 points.

Notably, judges believed ChatGPT gave more human-like responses than those of the other chatbots, but none of them showed signs of hallucination. Both studies suggest that while AI can greatly assist clinicians, extensive testing and safeguards are needed to ensure the technology's reliability in preventing harm and optimizing care delivery.

References

  1. Cordella N, Moses J. Artificial intelligence and the practice of patient safety: GPT-4 performance on a standardized test of safety knowledge. Jt Comm J Qual Patient Saf. 50:745-747. doi:10.1016/j.jcjq.2024.05.007
  2. Artificial intelligence may enhance patient safety, say BU researchers. News release. Boston University School of Medicine. September 26, 2024. Accessed September 26, 2024. https://www.eurekalert.org/news-releases/1059374
  3. Klein HE. ChatGPT outperforms trainee doctors in assessing pediatric respiratory illness. The American Journal of Managed Care®. September 9, 2024. Accessed September 26, 2024. https://www.ajmc.com/view/chatgpt-outperforms-trainee-doctors-in-assessing-pediatric-respiratory-illness
Related Videos
Benjamin Scirica, MD, MPH, associate professor of medicine at Harvard Medical School and director of quality initiatives at Brigham and Women’s Hospital’s Cardiovascular Division
Glenn Balasky during a video interview
dr joseph alvarnas
Michael Lynch, MD, UPMC
dr alex jahangir
Fahad Tahir, MAS, MBA, FACHE, Ascension St Thomas
Leland Metheny, MD, University Hospitals Seidman Cancer Center
Andrew Cournoyer
Kelly Harris, APRN
Michael A. Choti, MD, MBA
Related Content
AJMC Managed Markets Network Logo
CH LogoCenter for Biosimilars Logo