Publication
Article
The American Journal of Managed Care
Author(s):
Florida managed care plans observe insufficient outcomes by contracted cardiac surgeons to reliably distinguish quality. Pooling data across insurers or using society data may help.
Objective
: To assess whether managed care plans can reliably infer the quality of cardiac surgeons’ outcomes.
Study Design
: Evaluation of administrative discharge data and reported health plan enrollments.
Methods
: We analyzed 221,327 coronary artery bypass graft (CABG) admissions performed by 398 cardiac surgeons in 75 state-regulated hospitals in Florida between 1998 and 2006. For our outcomes quality measure, we constructed surgeon-level risk-adjusted mortality rates using demographic and comorbidity data. We also obtained managed care plan enrollments in Florida in 2005 to discern the number of patient outcomes possibly seen by any individual plan. Finally, we constructed a confidence interval around any particular surgeon’s CABG outcomes quality and tested whether the surgeon’s quality could reliably be found to be worse than benchmarks using normal approximations and exact binomial limits.
Results
: Even if a plan had as high as a 50% share of a county’s managed care—insured CABG patients, then of 86 surgeons in the 5 largest counties in Florida, no surgeon could confidently be judged to be of poorer than average quality.
Conclusions
: In cardiac surgeons’ outcomes quality monitoring, individual managed care plans face a “law of small numbers.” Insufficient patient volume by contracted surgeons, inadequate variation in outcomes, and low levels of adverse outcomes combine to make true quality almost impossible to infer. Some mitigation may be possible through more effective use of data (more measures and pooling over time) and through more effective interorganizational sharing of data (leveraging specialist society quality data and statewide pooling).
(Am J Manag Care. 2009;15(12):890--896)
If a managed care plan attempts to use outcomes data to distinguish quality of contracted specialist physicians such as cardiac surgeons, it is likely that insufficient patient volume will stymie such efforts. This holds despite such cardiac surgery being the most common major surgery.
Managed care organization (MCO) insurers answer to shareholders, enrolled members, and a slew of regulators on potentially conflicting cost and quality objectives. Policy and natural experiments have shown that managed care plans contain costs but not at the expense of members’ health status or the quality of delivered care.1 An informed and experienced MCO can assess delivered care by directly tracking individual members’ health status or by indirectly tracking the quality of its providers’ outcomes.2,3 Managed care organizations can also selectively contract with higher-quality providers.4-6
Unfortunately, empirical evidence that such selective contracting occurs is thin.7 Surveys of MCO executives and models of contracting suggest that location and negotiated price far outweigh quality in contracting with hospitals.8-10 We hypothesize that this lack of selective contracting may simply reflect structural problems in this inference process. One such problem may be the well-known “law of small numbers,” namely, unstable sample means and noisy inferences of the true population variable of interest.11-13 Fragmented local markets and a lack of aggregate data on providers might contribute to such failures in information markets.14,15 When such information is unavailable, MCOs are forced to rely on their own assessments of outcomes achieved for their enrollees by providers. Even for the most common major surgery of coronary artery bypass graft (CABG) operation, there might be too few members experiencing care each year for an MCO to reliably assess or compare the realized mortality outcomes by contracted providers.
In this study, the key research question is whether MCOs are able to reliably distinguish physicians of different quality and, if not, why. We address this question by considering MCOs in Florida, where provider level information is not publicly available. We use outcomes achieved recently by Florida cardiac surgeons to investigate the ability of MCOs to infer quality.
Recently in these pages, a similar reliability issue was investigated in primary practice process quality.16 This study complements that work by focusing on outcomes in 1 specialty. To the extent to which process measures may not be easily correlated with outcomes, MCOs may place more weight on outcomes. We focus on specialists, who are arguably easier to associate with a particular patient outcome compared with the more diffuse responsibility shared by primary practice teams.
Methods
Data Sources
eAppendix A
The validated data comprised 221,327 CABG patient records of 398 cardiac surgeons and 75 state-regulated hospitals in Florida between 1998 and 2006 ( available at www.ajmc.com). The data were obtained commercially as deidentified discharge abstracts from Florida’s Agency for Health Care Administration. We also obtained enrollments in managed care health insurance from Florida’s Office of Insurance Regulation. This project was approved by our institution’s health system institutional review board.
Performance Data
Our main measure of performance was the mean observed in-hospital mortality rate and the mean risk-adjusted mortality rate over the year for a particular provider. Lack of data on complications, delayed mortality events, and process measures prevented their use as performance indicators. Risk adjustment allowed for comparison of providers whose patients may differ, for example, by age, preexisting illness, or the number of elective versus emergency cases. Risk adjustment tends to reduce dispersion of individual provider’s measures. Accordingly, we expect that, whenever unadjusted mortality measures suffer from a “small numbers” problem, so too will the use of adjusted measures.
and
eAppendix B eAppendix C
A risk-scoring model incorporating many standard data elements was developed to compute the patient’s expected preoperative in-hospital mortality risk ().17 A limitation of the model was the unavailability of medical record data (eg, left ventricular ejection fraction or body mass index), which allows better adjustment for patient illness severity.
eAppendix D
Using identifiers for surgeon and treating facility, we aggregated observed and expected in-hospital deaths for each year for each provider. We used the ratio of observed mortality to expected mortality to obtain the risk-adjusted mortality rate, standardized by multiplication times the annual statewide mortality rate ().
Managed Care Plan Data
eAppendices E, F,
G
In 2005, approximately 3.5 million Floridian residents were enrolled in managed care plans, and a total of 464 plans were offered by 27 firms through 66 counties ( and ). The largest 10 counties represent 59% of Florida’s population and 75% of total MCO members. We included the Medicare managed care segments, the Medicaid managed care segments, the individual managed care plans, and the 2 commercial segments of small group plans and large group plans.
Statistical Analysis
We assumed that MCOs desire to reliably infer a contracted provider’s quality compared with a geographic aggregate mean measure. To do this, we use several different statistical models. In all of these, we assumed that the underlying process yielding mortality events is a stationary binomial process (ie, a bent coin toss process yielding “0” and “1” values with independent fixed probabilities).
These assumptions allow us to calculate confidence intervals for a provider’s true mortality rate around his or her observed sampled mortality rate. In the simplest method, we use normal distribution approximations to the binomial confidence intervals. This method unfortunately fails when the observed mean mortality rate happens to be 0.00 in the period examined and does not yield an exact approximation when the surgeon operates on few patients (ie, if the product np of the number of admissions n and the mean in-hospital mortality p is less than ~5). To circumvent this, we also used 2 other approaches. Wilson score interval method approximates more exact confidence intervals around the observed mortality rate.18 Exact, possibly asymmetric, binomial confidence intervals were also calculated using SEMA software (SEMATECH, Austin, TX). For details of the exact binomial calculations and the SEMA software used for this, see the SEMATECH Web site (http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm). Neither of these 2 additional methods led to qualitatively important differences in our main results.
We made the reasonable assumption that the average managed care plan has no access to traditional Medicare fee-for-service claims and that any such outcomes associated with their providers were thus not observed. We report findings from the last available year of common discharge and enrollment data (2005) and focus on the 5 largest counties in Florida.
Results
We show in turn the dispersion of outcomes, the estimation and inference of true quality, and then how fragmented managed care market shares imply small patient volumes and low reliability.
Dispersion of Outcomes
Although CABG-related admissions constitute the most common major surgery, the 20,449 CABG-related admissions in Florida in 2005 represented an incidence rate of only 0.11%. Few residents in any given county undergo this procedure by itself or in conjunction with valve repair or other cardiac surgery. For example, among residents from Broward County, only 1494 admissions were recorded, of which almost all were in local Broward County hospitals. About half of these admissions were insured by MCOs, the remainder by traditional Medicare fee for service.
Figure
In the , we consider the 5 largest counties in Florida by CABG-related admissions. We compare and contrast riskadjusted and unadjusted mortality rates for individual surgeons practicing in these counties, focusing only on their managed care—insured patients. In 2005, the Florida statewide mean in-hospital mortality was 3.5%, while across the 5 counties herein the mean risk-adjusted mortality rates vary from a low of 1.2% in Orange County to a high of 5.6% in Miami-Dade County. Note that we compute these performance data by excluding patients residing in county A but having their procedure in county B. Our rationale is 2-fold. There are few such patients seeing few providers outside the home county. Hence, the information potentially available to an insurer on aggregate provider performance from those physicians is necessarily limited. The effect of risk adjustment, while slight, tended to reduce the number of above-average surgeons. For example, 9 surgeons in Broward County are above the county mean using unadjusted measures, while only 6 surgeons are above the county mean after risk adjustment. Birkmeyer and Dimick12 point out that risk adjusting for hospitals’ outcomes for a specific procedure did not lead to significant changes. In their data, risk-adjusted and unadjusted mortality rates were correlated at 0.95. Given the smaller individual volume seen by surgeons, this did not hold in our data. By county, correlation between risk-adjusted and unadjusted mortality rates ranged from a low of 0.62 to a high of 0.92, while across all 5 counties the rates were correlated at 0.60. Within each county, there is clearly substantial variability in outcomes for managed care patients, despite risk adjustment. In each county, some surgeons have a 0% mortality rate mean; in each county, some surgeons have more than twice the county mean rate. Across these 5 counties, the average cardiac surgeon operated on 31 managed care CABG patients in the year compared with a mean total caseload of 73 patients. Medicare fee-for-service patients made up the bulk of this difference.
Reliability of Inferences
Suppose for a moment that in each county a single MCO (eg, a local county monopoly insurer) was able to view all admissions and outcomes information on each surgeon who treated county managed care CABG patients. We shall subsequently show that this is far from the reality.
Table 1
In the middle columns of , we show that 32 of 86 surgeons had an unadjusted mortality rate in excess of their county’s mean, while 28 surgeons had a risk-adjusted rate that still exceeded the mean. But these are point estimates of the surgeons’ true quality. Only if the confidence interval around such a point estimate lies completely above the county mean can we reliably distinguish an outlier.
In the right-hand columns of Table 1, we show that now only 2 of 86 surgeons are assessed as being above the county mean at a 90% confidence level. When risk-adjusted measures are used, only 1 of 86 surgeons remains above their county mean at a 90% confidence level.
However, no MCO in our data has a 100% share of the county managed care enrollment. If an MCO observes fewer cases, then the reduction in sample size causes the confidence interval around the point estimates of mortality to widen considerably. This may lead to fewer surgeons being confidently identified as having higher mortality.
eAppendix H
We simulated these scenarios and found, for example, that with a market share of 50% only 1 surgeon could confidently be assessed as above county means based on unadjusted data. Using risk-adjusted data, no surgeon could confidently be assessed as having higher mortality than the county average benchmark ().
How conservative is the assumption that an MCO only has 50% market share of the managed care market in a large county? It turns out that this assumption is still overoptimistic.
Fragmented Markets
Among the Florida population of a little more than 18 million in 2005, approximately 3.5 million were insured through managed care plans. There were 27 MCOs in the Florida market in the first quarter of 2005, and these insurers were present in multiple counties. For example, Aetna Health (the largest MCO insurer in Florida) had 557,674 members distributed in 28 large counties. In the 10 largest counties, the largest county MCO enrolled on average only a small share (~7%) of the county population. This membership was further fragmented in the product lines offered by each MCO, which serve large group employers, small group employers, and Medicaid and Medicare managed care members. In our analyses to date, we have made the conservative assumption that an MCO will internally aggregate provider volumes and outcomes regardless of the marketed plan. For example, surgeon A is observed to provide services to older patients in the MCO’s Medicare Advantage plan and is observed to provide services to employed patients in the MCO’s commercial large group health maintenance organization (HMO).
Table 2
In , the share of managed care enrollees held by the largest county MCO is shown to lie between 19% and 33%. This allows us to estimate, for example, that only approximately 165 patients of 735 managed care admissions in Broward County that year were members of the highest-share managed care insurer that year.
We showed earlier that, even with 50% market share, an MCO will struggle to confidently assess surgeon performance. The far lower market shares further weaken an MCO’s ability to act on effectively unreliable information.
Discussion
By focusing on specialist cardiac surgeons treating 1 common disease class with a common surgical intervention, we investigated the reliability of in-hospital mortality quality measures. In our empirical setting, standard statistical techniques (mortality rates with associated sampling error and confidence intervals) were used to gauge whether particular providers had significantly poorer outcomes than average benchmarks. We showed that the small patient volumes of providers rendered such quality measures unreliable for assessing outliers. In our data, the mean managed care patient volume seen by cardiac surgeons was only 31 cases per year. Starting from this low base, we showed that MCO market fragmentation further decreased the mean caseload observed by even the largest plan in a large county.
Can MCOs or other entities assess provider quality more reliably? A series of related statistical and policy options can make better use of data, reengineer the gathering of information, and delineate responsibility for such quality monitoring.
Statistical Options to Increase Reliability
Better use of data may involve pooling different measures to obtain composite measures. Alternatively, it may involve pooling across time and geography to increase the sample size.
Pooling Across Measures. Rather than focusing on 1 outcomes measure, a composite measure may be more meaningful.19,20 More sophisticated multidimensional report cards have also been proposed.21-23 Computationally complex Bayesian techniques can also control for selection biases, namely, the tendency of unobservably sicker patients to seek out particular providers of healthcare.24 It remains unclear, however, whether in our empirical setting MCOs can gather and process the multivariate outcomes data needed for these approaches.
Pooling Across Time. Simpler alternatives may involve gathering more data on a provider over time, for example, as New York state does in its cardiac reporting program.25 The program monitors thoracic surgeon performance using 3-year moving means. A similar program is in operation in Massachusetts. The risk of “gaming” exists: surgeons may decline to operate on unobservably (to the analyst) sicker patients, believing that their ratings might decline.26 Pooling and averaging over time allows “reversion to the mean” and curtails unreliable transient results.27
eAppendix H
Unfortunately, MCOs in states such as Florida stand to gain less by aggregating provider results over time. Two problems exist. First, there is substantial turnover in surgeons over time. Of 398 cardiac surgeons practicing from 1998 to 2006, 160 were observed to practice less than 3 years in these data. Waiting 3 years to judge quality reliably in these surgeons would not have been feasible. Second, pooling over time does not adequately compensate for the small number problem induced by small market shares. Of 71 cardiac surgeons practicing continuously over the 4 years from 2003 to 2006 in the 5 counties presented herein, 30 had a higher than expected mortality rate in 2005 after risk adjustment. Of these, 17 had a higher than expected mortality rate over the whole period from 2003 to 2006, while 13 had a lower than expected mortality rate over the whole period. But these are point estimates only; the confidence intervals were still too broad to distinguish surgeons well. Put differently, there was still insufficient volume to allow an MCO with limited market share to reliably distinguish higher mortality. Simulation results are given in . Unlike in New York, where the state “sees” each outcome, in Florida too few cases are seen by each MCO even over several years.
Pooling Across Geography. Even when hospitals (necessarily aggregating patients across surgeons, plans, and geographies) are monitored at the state level, informative signals are rare. The Florida Department of Health monitors and publicizes hospital (but not surgeon) risk-adjusted mortality rates for CABG procedures. In 2007, of 212 state facilities performing CABG operations, 66% had “too few cases” for a reliable analysis and 31% were “as expected.” There were no “better than expected” hospitals, and 3% were “worse than expected.” Details about Florida’s hospital monitoring are available at the FloridaHealthFinder.gov Web site (www.floridahealthfinder.gov/CompareCare/CompareFacilities.aspx).
Policy Options to Increase Reliability
Where does this leave MCOs? Several practical policy options exist for managing quality better for those insurers operating at the county level in fragmented markets.
Delegation to Providers. Managed care organizations can delegate quality assessment to peer societies, local hospitals, and surgeons. This overcomes fragmentation caused by small managed care market shares and captures the non—managed care cases, which make up about half of all cases in a typical large Florida county. The Society of Thoracic Surgeons National Database program offers such services at reasonable costs to reporting providers, offering expert risk adjustment and using standardized clinical data. Managed care insurers can insist on receiving benchmarked performance data on their contracted hospitals and surgeons. They may choose to reward or incentivize the provision of such information and can question their relationships with providers who do not participate.
Delegation to Regulators. Regulators can assume responsibility for healthcare quality. This option implies the outsourcing of key strategic objectives, namely, the monitoring of supplier quality and the configuration of a high-quality network of providers. However, outsourcing responsibility carries hidden risks. Consider Florida Hospital Orlando’s experience 2 years ago: the state regulator highlighted its “worse than expected” mortality rate for heart attack victims using 1 risk-adjustment approach. In contrast, using several highly reputable competing methods, the Adventist Health System member claimed to have lower than expected mortality rates. Details about the hospital’s side of the dispute are available at the Florida Hospital Web site (http://www.flhosp.org).
Collaboration With Other MCOs. Managed care organizations can collaborate with local MCO competitors in developing quality improvement initiatives. Patient privacy and confidentiality of medical information can be safeguarded, while still allowing a more complete and reliable view of the performance of providers. Antitrust issues can be mitigated by appealing to clear public interest in quality healthcare provision. In today’s nonhealth commercial insurance markets (eg, car insurance), information flow between different market participants is commonplace.
Limitations
Administrative data of the form we used most likely represent an upper bound on what is feasibly available to the average MCO. Better risk adjustment would tend to compress the range of provider outcomes more tightly around the mean, further reducing variation in quality.
Our assumptions on managed care quality monitoring may overstate and understate information processing capabilities. On the one hand, it is unclear how much statistical analysis is conducted by the average plan or by smaller plans in our sample. If plans use different rules not depending on confidence intervals (eg, suspend contracts for surgeons with >2 times the benchmark level of adverse outcomes), then such blunt tools may preserve members from the poorest quality providers. Unfortunately, good surgeons with low volumes and runs of poor outcomes would suffer from such an approach. On the other hand, plans may informally (and unobserved by us) share information on particular providers with the ultimate objective of ensuring delivered quality of care to members.
This study focused on 1 disease class in a single state to illustrate what we conjecture without supporting data to be a more general finding in states with fragmented markets and with limited public information on individual providers. In other states with higher managed care penetration, with more consolidated and mature markets, or with other sources of quality information, our findings may not continue to hold.
Implications
It is more likely than not that managed care will have a large role in insuring previously unprotected citizens, in addition to increasing its penetration in commercial insurance markets. The problems of inferring cardiac surgeons’ quality identified herein are real challenges that have the potential to stymie the dual aims of cost containment and quality assurance. If reliable quality assessments by MCOs are not feasible in the current market framework, innovative policy and market solutions should clearly address these challenges.
Author Affiliations: From the Department of Community and Family Medicine, Duke University, Durham, NC.
Funding Source: None reported.
Author Disclosures: Dr Huesch reports that the Duke Clinical Research Institute (DCRI) administers and warehouses the data collected by the Society of Thoracic Surgeons National Database program. While Dr Huesch has recommended peer quality assessment through programs similar to this, he has no relationship with DCRI and this article was not viewed by any DCRI staff or faculty.
Authorship Information: Concept and design; acquisition of data; analysis and interpretation of data; drafting of the manuscript; critical revision of the manuscript for important intellectual content; statistical analysis; and provision of study materials or patients.
Address correspondence to: Marco D. Huesch, MBS, PhD, Department of Community and Family Medicine, Duke University, Box 90120, Durham, NC 27708. E-mail: m.huesch@duke.edu.
1. Goldman DP. Managed care as a public cost-containment mechanism. RAND J Econ. 1995;26(2);277-295.
2. Hofer TP, Hayward RA, Greenfield S, Wagner EH, Kaplan SH, Manning WG. The unreliability of individual physician “report cards” for assessing the costs and quality of care of a chronic disease. JAMA. 1999;281(22):2098-2105.
3. Legorreta AP, Christian-Herman J, O’Connor RD, Hasan MM, Evans R, Leung KM. Compliance with national asthma management guidelines and specialty care: a health maintenance organization experience. Arch Intern Med. 1998;158(5):457-464.
4. DeParle NA. As good as it gets? The future of Medicare+Choice. J Health Polit Policy Law. 2002;27(3);495-512.
5. Flynn KE, Smith MA, Davis MK. From physician to consumer: the effectiveness of strategies to manage health care utilization. Med Care Res Rev. 2002;59(4):455-481.
6. Cutler DM, McClellan M, Newhouse JP. How does managed care do it? RAND J Econ. 2000;31(3):526-548.
7. Hannan EL. Commentary. Med Care Res Rev. 1999;56:363-372.
8. Rainwater JA, Romano PS. What data do California HMOs use to select hospitals for contracting? Am J Manag Care. 2003;9(8):553-561.
9. Schulman KA, Rubenstein LE, Seils DM, Harris M, Hadley J, Escarce JJ. Quality assessment in contracting for tertiary care services by HMOs: a case study of three markets. Jt Comm J Qual Improv. 1997;23(2):117-127.
10. Gaskin DJ, Escarce JJ, Schulman K, Hadley J. The determinants of HMOs’ contracting with hospitals for bypass surgery. Health Serv Res. 2002;37(4):963-984.
11. Dimick JB, Welch HG, Birkmeyer JD. Surgical mortality as an indicator of hospital quality: the problem with small sample size. JAMA. 2004;292(7):847-851.
12. Birkmeyer JD, Dimick JB. Understanding and reducing variation in surgical mortality. Annu Rev Med. 2009;60:405-415.
13. Shahian DM, Normand SL. The volume-outcome relationship: from Luft to Leapfrog. Ann Thorac Surg. 2003;75(3):1048-1058.
14. Dranove D, Kessler D, McClellan M, Satterthwaite M. Is more information better? The effects of “report cards” on health care providers. J Polit Econ. 2003;111(3):555-588.
15. Haas-Wilson D. Arrow and the information market failure in health care: the changing content and sources of health care information. J Health Polit Policy Law. 2001;26(5):1031-1044.
16. Scholle SH, Roski J, Adams JL, et al. Benchmarking physician performance: reliability of individual and composite measures. Am J Manag Care. 2008;14(12):833-838.
17. Marcin JP, Li Z, Kravitz R, Dai JJ, Rocke DM, Romano PS. The CABG surgery volume-outcome relationship: temporal trends and selection effects in California, 1998-2004. Health Serv Res. 2008;43(1, pt 1):174-192.
18. Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Stat Sci. 2001;16(2):101-133.
19. Timbie JW, Shahian DM, Newhouse JP, Rosenthal MB, Normand SL. Composite measures for hospital quality using quality-adjusted life years. Stat Med. 2009;28(8):1238-1254.
20. Dimick JB, Staiger DO, Birkmeyer JD. Are mortality rates for different operations related? Implications for measuring the quality of noncardiac surgery. Med Care. 2006;44(8):774-778.
21. McClellan M, Staiger DO. Comparing the quality of health care providers. Forum Health Econ Policy. 2000;3:1-24.
22. Landrum MB, Normand SL, Rosenheck RA. Selection of related multivariate means: monitoring psychiatric care in the Department of Veterans Affairs. J Am Stat Assoc. 2003;98(461):7-16.
23. Bronskill SE, Normand SL, Landrum MB, Rosenheck RA. Longitudinal profiles of health care providers. Stat Med. 2002;21(8):1067-1088.
24. Geweke J, Gowrisankaran G, Town RJ. Bayesian inference for hospital quality in a selection model. Econometrica. 2003;71(4):1215-1238.
25. Shahian DM, Normand SL, Torchiana DF, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg. 2001;72(6):2155-2168.
26. Werner RM, Asch DA, Polsky D. Racial profiling: the unintended consequences of coronary artery bypass graft report cards. Circulation. 2005;111:1257-1263.
27. Daniels MJ, Normand SL. Longitudinal profiling of health care units based on continuous and discrete patient outcomes. Biostatistics.
2006;7(1):1-15.