Patient Characteristics Affect AI Performance in Breast Cancer Screening

Author(s):

A US study found that a commercially available artificial intelligence (AI) algorithm for breast cancer screening produced more false-positives in Black patients and people with denser breasts, highlighting the importance of diverse datasets in training AI algorithms to reduce health care disparities.

Woman receiving mammogram | Image Credit: Prathankarnpap - stock.adobe.com

Use of an FDA-approved artificial intelligence (AI) algorithm that analyzed negative screening results following digital breast tomosynthesis (DBT) examinations, patient characteristics greatly impacted case and risk scores.

A retrospective cohort study, published in Radiology, evaluated the effects patient characteristics had on false-positive rates of commercially available AI algorithms when interpreting true negative screening DBT mammograms from January 1, 2016, through December 31, 2019.¹ The study specifically focused on performance in predicting cancer from the current examination and in the subsequent year.

Unique subsets of patients were randomly chosen for the cohort to include 4 race and ethnicity groups: non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, and Hispanic.

The utilization of AI techniques has become increasingly popular throughout health care to streamline processes and deliver aid to patients quicker. In the cancer space, radiologists often develop burnout from their heavy workload, reader fatigue, and high stress levels, which can all cause errors in interpretation. With the transition from digital mammography to DBT, radiologists endure further burnout since the average interpretation time for DBT is twice as long.

According to the Breast Cancer Research Foundation, AI methods can detect breast cancer in its earlier stages, oftentimes unidentifiable under radiologists’ review.² Additionally, unique patterns, characteristics, and subtle abnormalities can be recognized by AI more thoroughly, allowing doctors to focus on accurate screening to reduce false-positive and negative outcomes.

As Wendie Berg, MD, PhD, FACR, professor of radiology at Magee-Womens Hospital of University of Pittsburgh Medical Center, highlighted in an interview with The American Journal of Managed Care^®, "The very first time you do any kind of test is the most likely to have false-positives. But once we have that prior test for comparison, it's much less likely that you will be called back for something that's just normal for you."

It is not to say AI would replace radiologists, but instead it has potential as a collaborative tool, especially in more rural or low-resourced areas where populations lack access to expert care. However, the efficiency of AI is currently hampered by the limited size and diversity of the datasets used to train algorithms.¹ This is because the FDA does not require diverse data sets for validation.

The present study analyzed case level cancer detection, ranging from 0 to 100, and risk scores for screening DBT examinations. The closer the score was to 100, the more confident the algorithm was in detecting malignancy on the mammogram. Mammograms that displayed a score above 49 were more likely to receive additional testing. False-positives were categorized as any positive found by the AI algorithm.

Enrolled patients included 1316 White participants, 1261 Black participants, 1351 Asian participants, and 927 Hispanic participants The average age of the study population was 54 years, and most patients had either scattered fibroglandular (46%) or heterogeneously dense (29%) breasts.

About 17% of cases were classified as suspicious under the AI algorithm and thus represented a false-positive in the analysis. Results showed statistically significant differences by race and ethnicity, age, and breast density (all P < .001). Patients identified as Black were more likely to have false-positive scores (OR, 1.5) compared with White patients. However, Asian patients were least likely to have false-positive scores (OR, .7).

False-positive or suspicious case scores were less likely among patients aged 41 to 50 years (OR, 0.6) and more common among patients aged 71 to 80 years (OR, 1.9) compared with patients aged 51 to 60 years.

Overall, false-positive risk scores were the highest among Black patients (OR, 1.5) aged 61 to 70 years (OR, 3.5) and who have extremely dense breasts (OR, 2.8) compared with White patients 51 to 60 years with fatty density breasts.

Several limitations exist that could affect the generalizability of these findings. The study was conducted at a single institution, potentially limiting its applicability to other health care settings. Additionally, different AI algorithms have varying thresholds for flagging suspicious cases. This variability can lead to discrepancies in results and interpretations across different health care providers.

These limitations underscore the importance of ongoing research and development in AI-assisted cancer detection. Different health care practices will require tailored approaches to optimize the benefits of AI tools while minimizing their drawbacks.

Additionally, a separate study in Sweden linked an increased risk of breast cancer in women who had previously received false-positive results.³ This finding highlights the need for further refinement of AI algorithms and screening methods to deliver more accurate results and minimize unnecessary biopsies and associated anxieties.

Transparency in algorithm training is also crucial. Companies developing AI tools should be more forthcoming about the data used to train their algorithms. Relying solely on data from White patients can lead to biased results, such as higher false-positive rates for Black patients. This can exacerbate existing health care disparities and hinder the potential benefits of AI in cancer detection.

References

1. Nguyen DL, Ren Y, Jones TM, Thomas SM, Lo JY, Grimm LJ. Patient characteristics impact performance of AI algorithm in interpreting negative screening digital breast tomosynthesis studies. Radiology. 2024;311(2):e232286. doi:10.1148/radiol.232286

2. Can AI and machine learning revolutionize the mammogram? Breast Cancer Research Foundation. April 18, 2024. Accessed May 29, 2024. https://www.bcrf.org/blog/ai-breast-cancer-detection-screening/#:~:text=AI%20techniques%20can%20help%20radiologists

3. Santoro C. Swedish study links false-positive mammograms to elevated breast cancer risk. The American Journal of Managed Care. February 16, 2024. Accessed May 29, 2024. https://www.ajmc.com/view/swedish-study-links-false-positive-mammograms-to-elevated-breast-cancer-risk

Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.

Subscribe Now!

Patient Characteristics Affect AI Performance in Breast Cancer Screening

Newsletter