Publication|Articles|May 31, 2024

Special Issue: Health IT
Volume 30
Issue SP 6
Pages: SP468-SP472

Equity and AI Governance at Academic Medical Centers

Author(s)Paige Nong, PhD, Reema Hamasha, MHI, Jodyn Platt, PhD, MPH

This study identifies limited engagement with equity among academic medical centers as they develop governance processes for artificial intelligence (AI)/machine learning and predictive technologies.

ABSTRACT

Objectives: To understand whether and how equity is considered in artificial intelligence/machine learning governance processes at academic medical centers.

Study Design: Qualitative analysis of interview data.

Methods: We created a database of academic medical centers from the full list of Association of American Medical Colleges hospital and health system members in 2022. Stratifying by census region and restricting to nonfederal and nonspecialty centers, we recruited chief medical informatics officers and similarly positioned individuals from academic medical centers across the country. We created and piloted a semistructured interview guide focused on (1) how academic medical centers govern artificial intelligence and prediction and (2) to what extent equity is considered in these processes. A total of 17 individuals representing 13 institutions across 4 census regions of the US were interviewed.

Results: A minority of participants reported considering inequity, racism, or bias in governance. Most participants conceptualized these issues as characteristics of a tool, using frameworks such as algorithmic bias or fairness. Fewer participants conceptualized equity beyond the technology itself and asked broader questions about its implications for patients. Disparities in health information technology resources across health systems were repeatedly identified as a threat to health equity.

Conclusions: We found a lack of consistent equity consideration among academic medical centers as they develop their governance processes for predictive technologies despite considerable national attention to the ways these technologies can cause or reproduce inequities. Health systems and policy makers will need to specifically prioritize equity literacy among health system leadership, design oversight policies, and promote critical engagement with these tools and their implications to prevent the further entrenchment of inequities in digital health care.

Am J Manag Care. 2024;30(Spec Issue No. 6):SP468-SP472. https://doi.org/10.37765/ajmc.2024.89555

_____

Takeaway Points

We identify limited engagement with equity among academic medical centers as they develop governance processes for artificial intelligence/machine learning and predictive technologies.

A minority of participants considered inequity, racism, or bias in governance.
Most participants conceptualized these issues as characteristics of a tool, using frameworks such as algorithmic bias or fairness. Fewer participants understood equity beyond the technology itself and asked broader questions about its implications for patients.
Disparities in health information technology resources across health systems were repeatedly raised as a threat to equity.
Health systems and policy makers will need to specifically prioritize equity literacy among health system leadership, design oversight policies, and promote critical engagement with these tools and their implications to prevent the further entrenchment of inequities in digital health care.

_____

Research on predictive models and implementation of artificial intelligence (AI) and machine learning (ML) has grown significantly over the past 15 years.^1,2 The literature includes multiple analyses of how these technologies can reflect or entrench existing inequities.^3-5 Racism in particular has been a core focus of this literature, with multiple empirical examples of how structural racism becomes embedded in both the data used to create these models and the output they produce.^6-8 However, health systems have little guidance on how to govern and evaluate predictive models, specifically with respect to inequities.⁹ It is currently unclear how governance can best address the issues around racism and equity at the health system level, particularly in the absence of strong federal oversight focused on these concerns.

Federal agencies such as the FDA and Office of the National Coordinator for Health Information Technology are grappling with how to regulate or monitor these evolving technologies.^10,11 Some emerging AI/ML frameworks emphasize methodological quality assurance and evaluation processes, whereas others repeatedly call for protecting public trust and patient engagement.¹² The White House Office of Science and Technology Policy, along with the National Academies of Sciences, Engineering, and Medicine, has produced policy documents emphasizing the need for public engagement, trust, and equity.^13,14 Some states have also pursued action to encourage governance that ensures information technology (IT) is unbiased. The California State Attorney General, for example, initiated a statewide inquiry into racial bias in health care algorithms.¹⁵

As regulatory agencies and policy makers grapple with the pace of technological development, health systems are making decisions about which models to deploy and how to use them, with high-stakes consequences for patients. This study’s findings contribute to ongoing work focused on understanding how health systems make decisions about AI/ML and prediction and how these processes affect patients. We specifically asked whether and how academic medical centers consider racism and other inequities as they design and implement new governance policies for AI/ML and predictive technologies.

METHODS

We constructed a database of all current Association of American Medical Colleges hospital and health system members in 2022. All member organizations were included in our sampling frame except for US Department of Veterans Affairs systems (n = 40), pediatric and children’s hospitals (n = 22), and other specialty hospitals (n = 5) such as cancer centers due to their organizational and clinical differences from other health systems in the sample. The resulting sample of 246 health systems was stratified by census region (Northeast, Midwest, South, West). Random samples of 10 systems were drawn from each region using Stata 16 (StataCorp LLC). A committee of health informatics experts reviewed the resulting sample and suggested additional academic medical centers to include based on their knowledge of the use of advanced informatics tools. Due to the emerging nature of AI/ML governance strategies, we deployed a flexible recruitment strategy to identify the most appropriate contact at each health system.

Initially, one health informatics executive or leader was identified for outreach at each academic medical center. This was typically a chief medical informatics officer or similarly positioned individual. This person received an initial email including a description of the interviewer and purpose of the research project in accordance with Consolidated Criteria for Reporting Qualitative Research (COREQ) guidelines. If this person was not directly involved in governance of AI/ML and prediction, we requested a referral to a colleague with relevant expertise in governance. That individual would then be contacted at least twice by email for recruitment. Participants were then scheduled for an interview.

A total of 17 individuals participated in interviews conducted via Zoom. The job titles of interviewees included data analytics officers, informatics officers, AI leads, and data governance directors. These 17 interviewees represented 13 academic medical centers across the country, including the Northeast (n = 4), Midwest (n = 3), South (n = 3), and West (n = 3). Interviews were conducted from October 2022 to January 2023, and each lasted approximately 30 to 60 minutes. This study was determined to be exempt by the University of Michigan Institutional Review Board. Reporting for this study follows COREQ reporting guidelines and draws on guidance for qualitative work in health informatics.^16,17

Design

An original semistructured interview guide was developed for this study. The guide was piloted with 3 health informatics and governance experts based at different academic medical centers and refined based on the pilot interviews. The interview guide focused on (1) a health system’s approach to predictive model and AI/ML governance, evaluation, and deployment and (2) how decisions were made about the use of these tools. To avoid social desirability bias, the topic of inequities was reserved for the last segment of the interview. This allowed the research team to identify whether equity was a core priority in governance or whether it required a prompt from the interviewers (see the eAppendix [available at ajmc.com] for the text of these questions and prompts).

Analysis

Two members of the research team (P.N., R.H.) conducted and recorded all interviews via Zoom. Each transcript was reviewed and edited by P.N. Throughout the process, memos were written to synthesize the interview content, key concepts, and connections between insights across interviews.^18,19 A memo was also written for each individual interview that contained notes, reflections, and inductive codes. The research team defined these inductive codes and combined them as needed. The initial set of codes was applied to a random set of 4 interviews by both coders (P.N., R.H.). Codes were reconciled and finalized and were then applied to all interview transcripts. Themes were identified using the analytic memos, reviews of all coded segments of transcripts, and data visualizations in MAXQDA software (VERBI GmbH).¹⁹

RESULTS

Participants representing 9 institutions mentioned concerns about equity or bias related to AI/ML and prediction. Only 1 of them specifically used the term racism. Participants indicated that 4 of 13 health systems in the sample included algorithmic bias or equity as priorities in their approaches to AI/ML and predictive model governance, whereas others reported it as a consideration or topic of occasional discussion. Two key themes emerged from this study related to equity and governance: (1) how equity is conceptualized or understood in the governance process and (2) the effect of structural inequalities between health systems.

Conceptualizing Equity and Bias

When they discussed equity, participants described it in 2 ways. First, and most common, was an understanding of inequity and bias focused on the technology or models themselves. This meant that individuals or their institutions were focused on specific aspects of a technology, such as the data used to train a model or its performance across demographic groups. Second, and less common, was a discussion of inequity and bias beyond the technology—in other words, in relation to the application of a model and the consequences for patients, accounting for the social and health-related realities of inequity.

Bias within the tool. For some participants, the concepts of equity and inequity were conflated with algorithmic bias or statistical fairness. Bias and fairness were discussed as characteristics of the models themselves (eg, equal performance across populations, representativeness of training data sets). One participant (participant 1) described this thinking about equity and tool function in this way: “We ask about the performance of models across racial groups. We ask [the developers] if their features include racial groups or other demographics that could be predictors for socioeconomic statuses. We ask them, ‘Why did you do that...?’ and to think through the implications and to think about whether those features might inadvertently exacerbate inequities.” This approach does not address the human decision-making related to model output. Rather, it focuses on the features of the model.

For participants who conceptualized equity as a characteristic of a tool, the questions they asked in their governance processes were focused on the data used to build a model, the predictors included, and performance across demographic groups. Another participant (participant 4) described this approach: “Many of us [who] are trying to do applied data science are very interested in algorithmic bias and equity, making sure that we’re not widening digital divides, making sure that we’re producing the same kinds of outputs and outcomes for people based on traditional health disparities, race, ethnicity, sexual orientation, gender identity, etc.…”

This reflects an understanding of inequity or bias based in statistical parity. Participants who conceptualized inequity as statistical bias did not typically discuss the larger social implications or contexts of the technologies they were governing or how the decisions based on model output should be considered with respect to equity.

Bias beyond the tool. For other participants, equity was a concern beyond the technology itself. It was related to how the model was used or to what purpose it was applied. One participant (participant 10) described this in the context of a model predicting patients missing appointments: “We look at the number side, but then we also look at the implementation side. So it wasn’t the race and religion [variables] that actually tipped off the equity [concern]; it was, ‘Who is the patient it’s identifying?’ It’s identifying people who aren’t going to show up. Who are the sorts of people who don’t show up? The people who need a wheelchair, the people who don’t have transportation, who have other issues. And so it was actually, regardless of the predictors, who are we trying to identify? And if we use that to double-book, if the people show up, it’s going to be the worst people to get double-booked.… So that was the bigger concern than anything around what was going on in the model.”

This discussion of equity clarifies the distinction between a model-centric approach and a broader understanding of the greater implications of the tool. Rather than focusing strictly on the technology itself, this participant describes the importance of thinking beyond the model. The governance process they describe and the way their team and system approach a given model encourage critical questions beyond statistical properties of the tool. They are concerned with the social realities of inequity and how they are replicated by a technology, even if the predictors themselves are not overtly biased or discriminatory.

Another participant (participant 5) specifically described how they combined concerns about biased tools and the inequitable structures outside the tools from a data science perspective: “The data science team had taken into consideration bias and health equity. So they put their models through this audit process and also work very closely with our health equity office and diversity group. Is the model performance bad for a certain group because that represents indeed a health difference? Or is it because of the systems’ effect or the bias of systems?... There is an audit in the model inputs, or other proxies for race. If you provided zip code, or something like that, that ends up being a proxy for race. So we do an audit of the inputs that are being fed to the model as well.”

This health system had a clear workflow for the data scientists vetting models that included an extensive list of questions about effects by race/ethnicity and specifically required an analysis of whether a potential difference in performance was driven by health disparities or disparate health care. In doing so, they bridged the 2 concepts of equity and fairness described by participants across institutions. They addressed model-centric understandings of bias and fairness in addition to larger equity issues and potential consequences of the tools.

System-Level Resource Inequalities

Most participants discussed the challenge of resource limitations in governing predictive models either as they applied to their own institution or to the broader health ecosystem of small community hospitals, midsize hospitals, and large health systems serving a variety of patient populations. Even among the leading academic medical centers that participated in this study, there were consistent concerns about allocating limited resources. The IT staff time required to design models or evaluate vendor-produced models was a limiting factor for most health systems. When participants considered the implications of these issues for less well-resourced health systems or hospitals, the concern about resources became even more pressing. Put another way, participants who generally had an IT team and data scientists evaluating, implementing, and managing these tools were concerned about their counterparts at smaller hospitals who did not have similar resources. Part of this concern was that smaller hospitals were left to rely on vendors to provide and evaluate models, despite perceiving these vendors as unreliable for ensuring the models were appropriate or suitably tailored to smaller or nonacademic settings. One participant (participant 2) said: “Smaller hospitals, community hospitals, etc, they typically just take the vendor package, right? They do not want to spend any effort on this, and it is just what they have. I think it is what it is until we can get interoperable decision support that’s easier to spread. It’s kind of…almost a luxury for the places that have more resources.”

Multiple participants highlighted this structural concern whereby systems with more resources will be able to properly evaluate predictive tools and others will not. Without the ability to validate tools on their own patient populations, underresourced systems would not be able to ensure that the tools work for their patients. When participants described this concern, they often referred to the role of the vendors that produce the models and expressed that best practices involve testing and validating a model before use. One participant (participant 1) said: “What [the vendor] does not necessarily require is for the clients to undergo a clinical effectiveness evaluation because it does take a little bit more resources to look at whether a model has its intended impact clinically and in terms of processes. I would encourage other clients to at least think about, in a minimal way, if you’ve done a silent evaluation and the model’s continuously promising when you put it in the hands of the clinicians…can you do a simple evaluation of a sample of usages to see whether it led to the intended outcome? Can you do some surveys to see what people thought of it and to ask periodically whether this model should continue to be used?”

Another participant (participant 9.1 [second interview for institution 9]) said: “I would say that the hardest part of all this is figuring out how to fund it.… We have probably half of a statistician’s time and my time and [a faculty member’s] time. It’s expensive.… I went to the [vendor] meeting and heard an interesting presentation about how a community hospital is using some of the other analytical tools inside [the electronic health record]. The thing is, they’re using them straight out of the box. They’re not doing anything because they just can’t. So, anyway, it’s a little bit scary.”

When this participant said, “They’re not doing anything,” they meant that the community hospital was not doing any evaluation or adjustments based on the tools’ performance in the hospital itself. For this participant, using a vendor model this way was concerning because the models may not perform as well in a particular patient population compared with their performance on a national sample of patients or a sample of patients from a different health system.

DISCUSSION

This study found a lack of consistent engagement with the concepts of equity, discrimination, bias, or racism as AI/ML and predictive models are deployed and governed. Although some health systems included in this study prioritized equity or bias in their governance processes, this was generally based on an individual’s personal interest or dedication rather than systemic or structural priorities.

Although a growing body of literature highlights the risk of racism and other inequities inherent in AI/ML and prediction, most participants did not describe these issues as central priorities in their governance processes. This indicates a need for improved equity literacy across the health care system. Specific guidelines and best practices that build on empirical evidence for incorporating equity audits and evaluations are also necessary. Other fields and collaborative efforts have produced recommendations for health systems to counter structural racism.²⁰ Similarly specific guidance and best practices to AI/ML, ideally building on recent transparency-focused federal regulation for predictive models,²¹ would improve governance in health care.

Disparities in health system resources are of serious concern. Vendor-produced tools without population validation and evaluation can replicate inequities through poor performance in a patient population. The gap between well-resourced academic medical centers and those without extensive IT capabilities requires further analysis. It also must be considered as regulation and policy evolve. Although disparities in IT funding or resources exist outside a given technology itself, they are inextricably linked to how the tool affects patients and their care. Evidence of a digital divide in advanced electronic health record use has previously been identified,²² and without explicit engagement with this kind of inequality, it is likely that the proliferation of AI/ML and prediction will replicate this pattern.

There is an urgent need to address racism and inequity in health IT broadly.^23-26 Efforts to dismantle racism and other forms of inequity require coordination across policy and practice. Multiple opportunities exist for policy intervention including oversight, regulation,²⁷ and audits for racial bias like the California Attorney General has initiated.¹⁵ Policy that responds to the resource inequalities between health systems identified in this study will be particularly important to prevent a yawning digital divide. Health systems, as they design and improve their governance processes, can also work to address inequities beyond statistical parity or algorithmic fairness. As described by some of the participants in this study, there are much larger questions to ask about the impact of a technology than whether it performs similarly across racial groups.

Limitations

Because this study is limited to academic medical centers, the findings do not reflect governance processes at nonacademic medical centers. The resource differences and variation in governance approaches across different kinds of health systems require further study to better understand the specific types of variation and inform policy. Additionally, the positions of the participants interviewed for this project are relevant to interpreting the findings. Perspectives of chief medical informatics officers and chief data analytics directors likely differ from those of clinicians, health administrators, and legal experts. Qualitative work that includes the perspectives of these various stakeholders will be necessary in the near future to further inform policy and regulation.

CONCLUSIONS

This study identified how AI/ML and predictive model governance at academic medical centers engages with the relationship between health information technologies and equity. It also highlighted the pressing concern of system-level disparities in IT resources and the implications for patients. We find that considerable work needs to be done to build literacy around equity, racism, bias, and fairness among leading decision makers at academic medical centers across the country. Health systems will need to prioritize capacity building in this area, in concert with greater policy guidance, to prevent further exacerbation of digital racism and inequity.

Author Affiliations: Division of Health Policy and Management, University of Minnesota School of Public Health (PN), Minneapolis, MN; Department of Learning Health Sciences, University of Michigan Medical School (RH, JP), Ann Arbor, MI.

Source of Funding: National Institutes of Health 5R01EB030492-02.

Author Disclosures: The authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.

Authorship Information: Concept and design (PN); acquisition of data (PN, RH); analysis and interpretation of data (PN); drafting of the manuscript (PN); critical revision of the manuscript for important intellectual content (PN, JP); provision of patients or study materials (RH); obtaining funding (JP); administrative, technical, or logistic support (RH); and supervision (PN, JP).

Address Correspondence to: Paige Nong, PhD, Division of Health Policy and Management, University of Minnesota School of Public Health, 516 Delaware St SE, Minneapolis, MN 55455. Email: nong0016@umn.edu.

REFERENCES

1. Panch T, Duralde E, Mattie H, et al. A distributed approach to the regulation of clinical AI. PLOS Digit Health. 2022;1(5):e0000040. doi:10.1371/journal.pdig.0000040

2. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12-22. doi:10.1016/j.jclinepi.2019.02.004

3. Adamson AS, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol. 2018;154(11):1247-1248. doi:10.1001/jamadermatol.2018.2348

4. Barda N, Yona G, Rothblum GN, et al. Addressing bias in prediction models by improving subpopulation calibration. J Am Med Inform Assoc. 2021;28(3):549-558. doi:10.1093/jamia/ocaa283

5. Cronjé HT, Katsiferis A, Elsenburg LK, et al. Assessing racial bias in type 2 diabetes risk prediction algorithms. PLOS Glob Public Health. 2023;3(5):e0001556. doi:10.1371/journal.pgph.0001556

6. Benjamin R. Assessing risk, automating racism. Science. 2019;366(6464):421-422. doi:10.1126/science.aaz3873

7. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi:10.1126/science.aax2342

8. Dhiman P, Ma J, Andaur Navarro CL, et al. Risk of bias of prognostic models developed using machine learning: a systematic review in oncology. Diagn Progn Res. 2022;6(1):13. doi:10.1186/s41512-022-00126-w

9. Walsh CG, McKillop MM, Lee P, Harris JW, Simpson C, Novak LL. Risky business: a scoping review for communicating results of predictive models between providers and patients. JAMIA Open. 2021;4(4):ooab092. doi:10.1093/jamiaopen/ooab092

10. Office of the National Coordinator for Health Information Technology. ONC health IT standards bulletin 2022-2. July 2022. Accessed October 26, 2023. https://www.healthit.gov/sites/default/files/page/2022-07/Standards_Bulletin_2022-2.pdf

11. Clinical Decision Support Software: Guidance for Industry and Food and Drug Administration Staff. HHS, FDA, Center for Devices and Radiological Health, Center for Biologics Evaluation and Research, Center for Drug Evaluation and Research, Office of Combination Products in the Office of the Commissioner; 2022. Accessed August 8, 2023. https://www.fda.gov/media/109618/download

12. Trustworthy AI (TAI) Playbook. HHS; 2021. Accessed September 6, 2022. https://www.hhs.gov/sites/default/files/hhs-trustworthy-ai-playbook.pdf

13. Blueprint for an AI bill of rights. The White House. Accessed December 9, 2022. https://www.whitehouse.gov/ostp/ai-bill-of-rights/

14. National Academies of Sciences, Engineering, and Medicine; National Academy of Medicine. Toward Equitable Innovation in Health and Medicine: A Framework. The National Academies Press; 2023. Accessed August 25, 2023. https://www.nap.edu/read/27184/chapter/1

15. Attorney General Bonta launches inquiry into racial and ethnic bias in healthcare algorithms. News release. State of California Department of Justice Office of the Attorney General. August 31, 2022. Accessed October 13, 2022. https://oag.ca.gov/news/press-releases/attorney-general-bonta-launches-inquiry-racial-and-ethnic-bias-healthcare

16. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349-357. doi:10.1093/intqhc/mzm042

17. Ancker JS, Benda NC, Reddy M, Unertl KM, Veinot T. Guidance for publishing qualitative research in informatics. J Am Med Inform Assoc. 2021;28(12):2743-2748. doi:10.1093/jamia/ocab195

18. Creswell JW. Qualitative Inquiry and Research Design: Choosing Among Five Approaches. 2nd ed. Sage Publications Inc; 2007.

19. Deterding NM, Waters MC. Flexible coding of in-depth interviews: a twenty-first-century approach. Sociol Methods Res. 2021;50(2):708-739. doi:10.1177/0049124118799377

20. Wyatt R, Tucker L, Mate K, et al. A matter of trust: commitment to act for health equity. Healthc (Amst). 2023;11(1):100675. doi:10.1016/j.hjdsi.2023.100675

21. Everson J, Smith J, Marchesini K, Tripathi M. A regulation to promote responsible AI in health care. Health Affairs Forefront. February 28, 2024. Accessed February 28, 2024. https://www.healthaffairs.org/do/10.1377/forefront.20240223.953299/full/

22. Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide. J Am Med Inform Assoc. 2017;24(6):1142-1148. doi:10.1093/jamia/ocx080

23. Platt J, Nong P, Merid B, et al. Applying anti-racist approaches to informatics: a new lens on traditional frames. J Am Med Inform Assoc. 2023;30(10):1747-1753. doi:10.1093/jamia/ocad123

24. Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight — reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874-882. doi:10.1056/NEJMms2004740

25. Beach MC, Saha S, Park J, et al. Testimonial injustice: linguistic bias in the medical records of Black patients and women. J Gen Intern Med. 2021;36(6):1708-1714. doi:10.1007/s11606-021-06682-z

26. McCall T, Asuzu K, Oladele CR, Leung TI, Wang KH. A socio-ecological approach to addressing digital redlining in the United States: a call to action for health equity. Front Digit Health. 2022:4:897250. doi:10.3389/fdgth.2022.897250

27. Ferryman K. Addressing health disparities in the Food and Drug Administration’s artificial intelligence and machine learning regulatory framework. J Am Med Inform Assoc. 2020;27(12):2016-2019. doi:10.1093/jamia/ocaa133