Article
Author(s):
While previous studies have revealed concerning issues regarding workplace wellness programs, the industry has now crossed a line. The program chosen as the industry's best actually harmed employees.
Previous columns on wellness have highlighted rampant innumeracy, clinical implausibility, complete ineffectiveness, and mathematical impossibility in wellness industry outcomes claims. The headline of a recent lead investigative story in Slate called workplace wellness a “sham” that producing no savings. (The influential Incidental Economist had already reached a similar conclusion, expressed “many times and in many ways.”)
While these articles have revealed concerning issues, one could argue that workplace wellness is a business-to-business venture, with contracts entered voluntarily. Therefore, it is the responsibility of individual corporations and their consultants to identify and prevent questionable business practices and vendor claims, as long as employees themselves aren’t being harmed.
However, it appears that the industry has now crossed a bright line, as the program chosen as the industry’s best—winning a C. Everett Koop Award from industry peers—did indeed harm employees, as is shown below and as was reported, and unrebutted, in STAT. This result, in any program—but especially the alleged best—should raise red flags for customers and regulators.
By way of background, the Koop Award is supposed to recognize the year’s most exemplary wellness program. However, these allegedly exemplary award-winners have consistently been criticized for ineffectiveness and or grossly inflated savings. Consider the 2015 winner, McKesson, whose controversial outcomes were newsworthy enough to reach Employee Benefit News. The company claimed millions in savings, bur participants got fatter, and their cholesterol and glucose increased. Comparing the improvements and deteriorations in biometric risk factors showed no net change: both the “increased” and “decreased” columns in McKesson’s award application reproduced below sum to 58%, meaning there was no change in objective participant health status. (Non-participants and dropouts are not included in the chart below.)
Not achieving favorable outcomes is a business-to-business issue. McKesson elected to contract for the program, taking the risk that it might not save money or improve employee health. However, harming employees is something else altogether, and in 2016, that's what Wellsteps did to the employees of the Boise School District. (It may have been just a correlation. Causality is difficult to prove, but Wellsteps’ report claims causality throughout.)
First, we will show how Boise's employees became unhealthier both objectively, according to the trend of the biometrics, and subjectively, according to self-rated health status. Then, we will explore the specific aspects of the Wellsteps intervention that could have caused this outcome.
Objective Biometric Mean Scores Deteriorated
Wellsteps presented a summary of the changes of biometric means. It is almost always the case that, when adjusted for expectable aging-related changes, the absolute number of low-risk people in a population whose risk increases roughly equals the number of high-risk people whose risk declines. This is regression to the mean, known in the field as the “natural flow of risk.”
However, in this case, totaling the “mean change through 1 year” figures below shows that the former swamped the latter: 5293 biometric mean readings improved, while 6397 deteriorated. As was the case with McKesson and is the industry standard measurement technique, dropouts and non-participants are not counted or acknowledged in any other way.
Deterioration in Self-Reported Health
Observing those results, one would expect a deterioration in self-rated health as well, and indeed the latter showed a small but statistically significant decline. While some other self-reported health indicators improved slightly, self-rated health is the self-reported indicator most commonly used as a proxy for health, and is considered the most valid self-reported indicator of a patient’s health status and future healthcare utilization. (Once again, in the table below, dropouts and non-participants were excluded.)
Could the Wellsteps Program Caused Boise Employees to Become Unhealthier?
It may be that the Boise employees would have become unhealthier anyway, and that, like the McKesson program, the Wellsteps program made no difference. This conclusion is belied not only by the amount of causality claimed in its report, but also by a few specific undertakings that are known or widely believed to cause health to deteriorate.
First, is overscreening. Wellsteps acknowledged in its July 11 posting that screenings should not be done annually, in order to avoid harming employees with overdiagnosis and overtreatment:
And yet 4 days later, in the July 15 write up of its Koop Award, Wellsteps noted:
More screening reveals more abnormalities, and the more abnormalities people are told they have, the worse they may feel. This is called the “nocebo effect.”
Second, in addition to the general overscreening creating this tendency, a specific example of the “nocebo effect” from this program would be moralizing about alcohol consumption in any quantity. In the chart above, note the average self-reported consumption of alcohol is 1.1 to 1.3 ounces per day, well below the 15 ounces/week considered problematic. However, Wellsteps calls any consumption of alcohol a “high level” and a “worst health behavior.” As noted in the chart below, everyone admitting to any amount of alcohol consumption (meaning the same group as in the chart above) is found to have this worst health behavior.
Third, paying people to change behaviors (or in the case of self-reported health habits, paying people to say they changed behaviors) is very controversial and may be counterproductive. It is also possible that the incentive-based program design itself created iatrogenic consequences due to the very high ($830) incentive payments.
Between the overscreening, misunderstanding of the risks of alcohol, use of powerful but possibly counterproductive incentives, and Wellsteps’ own claims of causation, it is possible, if not likely, that the program caused the deterioration in health. Even if it didn’t and this was just pure coincidence, no program should be given an award for this performance, and no vendor should be allowed to claim it improved outcomes, based on this performance.
Wellsteps’ award application revealed other issues as well. Examples would include misattribution of savings to the program and mathematically incompatible savings figures. Wellsteps’ CEO, Steve Aldana, and the head of the Koop Award Committee, Ron Goetzel, also appear to contradict themselves, and or admit error, in attempting to defend this program.
While these flaws don’t involve the specific question of harms to employees addressed in this posting, they do further confirm that this industry needs to address major issues of credibility, mathematical competence, and cost-effectiveness, as Slate quite dramatically posited.
Putting Harms to Employees in Context
While the Hippocratic oath calls for doctors to “do no harm,” an excellent argument may be made that the standard for wellness programs should be much higher due to the financial coercion that organizations use to drive employee engagement. Specifically, virtually everything else in ambulatory healthcare requires “opting in” to actively seek medical assistance. By contrast, wellness requires employees to “opt out” in order to avoid a clinical intervention. In the Boise example, opting out costs $830 in higher deductibles and contribution.
To put this in perspective, other specific health-related activities for which people are or can be penalized for “opting out” include: wearing helmets/life jackets/seat belts and getting kids vaccinated. In each case, the clinical evidence/science allows government paternalism to overwhelm considerations of personal choice. Wellness, as the Wellsteps example shows, does not remotely approach the level of evidentiary certainty of these other de facto personal health requirements.
Further, in the case of wellness, the government has abdicated design, implementation and day-to-day enforcement to the employer. The other major workplace area where the government cedes those to the employer is worker safety. However, safety requirements differ from wellness in 3 key ways:
A Regulatory Proposal
Our organization will recommend doing programs with/for employees rather than to them, and will focus on promoting well-being and avoiding bad health outcomes. Our choices and frequencies of screenings are consistent with United States Preventive Services Task Force (USPSTF), CDC guidelines, and Choosing Wisely.
The fact that Wellsteps is considered award-worthy raises the question of what harms are being done by non—award-worthy vendors, which are presumably worse than Wellsteps. To protect employees from harms caused by such vendors—especially the ones embracing programs rated “D” by the United States Preventive Services Task Force—a model Employee Health Program Code of Conduct has been proffered. It basically urges wellness companies to aim to do no harm:
This Code could be the basis for a regulation. It might be too politically difficult to require wellness vendors to adhere to guidelines, owing to the support for wellness by politically powerful entities such as the Business Roundtable. Instead, a requirement of disclosure might be sufficient.
So perhaps vendors should be required to either adhere to this or a similar “do no harm” standard, or else explain to employees why they aren’t adhering to it—and get releases from the employees saying they understand the risks of overscreening but would like to participate in wellness anyway. If they decline, they would be offered a “reasonable alternative” to the wellness program, such as healthcare education or an exercise program. While there is no guarantee the alternatives would save money or generate a positive impact on health, the likelihood that exercise or education would harm employees is far lower than the demonstrated chance of overscreeninng, moralizing, or large financial penalties would do exactly that.