Glaucoma specialists were outperformed by a large language model chatbot when it came to diagnostic and treatment accuracy in glaucoma cases.
A large language model (LLM) chatbot was able to outperform glaucoma specialists and match retina specialists in terms of accuracy when presented with deidentified glaucoma and retina cases and questions, according to a study published in JAMA Ophthalmology. This finding indicates that it could be a diagnostic tool in the future.
LLMs chatbots—a form of artificial intelligence—have previously demonstrated their ability to perform well on Ophthalmic Knowledge Assess Program examinations, and research has begun to examine how they can be used in specific areas of ophthalmology. This study aimed to assess the broader capabilities of the chatbot by comparing its accuracy with that of ophthalmologists at the attending level. Glaucoma and retina specialists who were at the fellowship level were compared with the LLM in this study.
The cross-sectional study took place in a single center. All data for eyes were taken from the Department of Ophthalmology at Icahn School of Medicine at Mount Sinai, New York, New York. All specialists were practicing physicians in the same center. The researchers selected 10 glaucoma and retina questions each from the Commonly Asked Questions of the American Academy of Ophthalmology to test knowledge on clinical questions. To test case management knowledge, 10 of retina cases and 10 glaucoma cases were selected from patients in the department. All selections of questions and patients were random.
The GPT-4 chatbot, whose version was that from May 12, 2023, was used for the study. A 10-point Likert scale was used to measure the accuracy of all answers, with 1 and 2 representing poor or unacceptable inaccuracies and 9 and 10 representing very good accuracy without any inaccuracies. A 6-point scale was used to evaluate how medically complete the results were.
The specialists for retina and glaucoma answered the clinical questions and the case management questions, and their answers were compared with the answers generated by GPT-4 as the primary end point.
There were 1271 images for accuracy and 1267 images for completeness rated for this study. There were 12 specialists included, with 8 of them being glaucoma specialists and 4 being retina specialists; 3 ophthalmology trainees were also included. The mean (SD) amount of years that the participants practiced was 11.7 (13.5) years.
The LLM chatbot had a mean combined question-case accuracy rank of 506.2, whereas the glaucoma specialists had a mean rank of 403.4. The mean rank for completeness was similar within the 2 groups at 528.3 for the LLM chatbot and 398.7 for the specialists. The mean rank for combined accuracy was closer between the LLM chatbot and the retina specialists, at 235.3 and 216.1, respectively. The mean rank for completeness was comparable at 258.3 for the chatbot and 208.7 for the retina specialists.
“Both trainees and specialists rated the chatbot’s accuracy and completeness more favorably than those of their specialist counterparts,” the authors wrote, with specialists rating the chatbot significantly better than humans in terms of accuracy and completeness.
There were some limitations to this study. This study took place at a single center with only 1 group of attendings, which may not make it generalizable to other populations. There are also limitations to the decision-making of chatbots, especially with complex decisions, which should be considered.
Overall, this assessment found that the LLM chatbot displayed comparative accuracy in diagnosis to retina and glaucoma specialists when it came to both clinical questions and clinical cases, which indicates its potential use as a tool in diagnosis.
Reference
Huang AS, Hirabayashi K, Barna L, Parikh D, Pasquale LR. Assessment of a large language model’s response to questions and cases about glaucoma and retina management. JAMA Ophthalmol. Published online February 22, 2024. doi:10.1001/jamaophthalmol.2023.6917
Bringing Connectivity to the Specialty Pharmacy Workflow
May 2nd 2024In a session during the final full day of conference activity at AXS24, experts from CVS Health and Surescripts emphasized the need to simplify the prescribing workflow for specialty medication through proactive messaging, automation, and interoperability.
Read More
Emily Goldberg Shares Insights as a Genetic Counselor for Breast Cancer Risk Screening
October 30th 2023On this episode of Managed Care Cast, Emily Goldberg, MS, CGC, a genetic counselor at JScreen, breaks down how genetic screening for breast cancer works and why it is so important to increase awareness and education around these screening tools available to patients who may be at risk for cancer.
Listen
AUA to Focus on Inclusive Care, Robotic Surgeries, and Future of Urology at 2024 Annual Meeting
May 1st 2024The American Urological Association (AUA) 2024 Annual Meeting will highlight the latest innovations and future trends in urology, featuring dynamic plenary sessions, educational opportunities for providers, and discussions on cutting-edge treatments and global health initiatives.
Read More
Examining Telehealth Uptake to Increase Equitable Care Access
January 26th 2023To mark the publication of The American Journal of Managed Care®’s 12th annual health IT issue, on this episode of Managed Care Cast, we speak with Christopher M. Whaley, PhD, health care economist at the RAND Corporation, who focuses on health economics issues, including the influence of the COVID-19 pandemic on health care delivery.
Listen
Forging a Patient-Centric Path to Revolutionize and Redefine Value-Based Care
April 30th 2024Optum Life Sciences and Takeda Pharmaceuticals are partnering on an innovative virtual care pilot program for inflammatory bowel disease meant to both continue the mission of the current value-based health care landscape and raise the bar for personalized care delivery optimization.
Read More