Designing Technologies for Value-based Mental Healthcare: Centering Clinicians’ Perspectives on Outcomes Data Specification, Collection, and Use

Daniel A. Adler; Yuewen Yang; Thalia Viranda; Anna R. Van Meter; Emma Elizabeth McGinty; Tanzeem Choudhury

PMC · DOI:10.1145/3706598.3713481·July 2, 2025

Designing Technologies for Value-based Mental Healthcare: Centering Clinicians’ Perspectives on Outcomes Data Specification, Collection, and Use

Daniel A. Adler, Yuewen Yang, Thalia Viranda, Anna R. Van Meter, Emma Elizabeth McGinty, Tanzeem Choudhury

PDF

Open Access

TL;DR

This paper explores how mental health clinicians view the use of data in value-based healthcare, aiming to improve technology design and data collection practices.

Contribution

The study introduces a new perspective on designing health technologies by centering clinicians' views on outcomes data in mental healthcare.

Findings

01

Clinicians emphasize aligning outcomes data with payment programs and care goals.

02

Opportunities exist for technologies and personal devices to enhance data collection.

03

Outcomes data can be used to hold stakeholders like insurers and social services financially accountable.

Abstract

Health information technologies are transforming how mental healthcare is paid for through value-based care programs, which tie payment to data quantifying care outcomes. But, it is unclear what outcomes data these technologies should store, how to engage users in data collection, and how outcomes data can improve care. Given these challenges, we conducted interviews with 30 U.S.-based mental health clinicians to explore the design space of health information technologies that support outcomes data specification, collection, and use in value-based mental healthcare. Our findings center clinicians’ perspectives on aligning outcomes data for payment programs and care; opportunities for health technologies and personal devices to improve data collection; and considerations for using outcomes data to hold stakeholders including clinicians, health insurers, and social services financially…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Keywords

health information technologymental healthuser-centered designhealth servicesimplementation sciencevalue-based carequalitative researchpassive sensingdigital phenotypingdigital biomarkersdigital mental health

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Mental Health Interventions · Mental Health Research Topics · Mental Health and Patient Involvement

Full text

Introduction

1

Health information technologies (HITs) are transforming how health services collect, share, and use data. Electronic health records (EHRs) collect clinical data on provided treatments and patients’ health, which can be aggregated and shared with regulators [16]. Personal devices, such as smartphones and wearables, create opportunities to bring data on behavior, physiology, and well-being from everyday life into clinical care [100, 132]. These data streams are being repurposed to transform how we pay for health services through value-based care (VBC) programs, where healthcare providers (eg, hospitals, clinicians) are paid based upon the “value” of care they deliver to patients [85, 145]. VBC programs are implemented by paying providers for effectively managing patients’ health, instantiated by collecting data to quantify the quality of care providers deliver [36, 145]. VBC is well-motivated: healthcare spending is on the rise – for many reasons, including increased prescription drug and medical device costs, increased service utilization, and a global aging population [92, 129] – but increased spending is not always associated with improved health outcomes [54]. VBC programs incentivize healthcare providers to deliver treatments that simultaneously improve health while reducing service utilization and cost [95]. But, these new financial arrangements raise a variety of sociotechnical questions. For example, what data quantifies the quality of care, and how should this data be collected and managed? How should quality data hold health systems accountable to improve patient outcomes? How do we design HITs that support this process?

In this work, we explored these questions in a specific area of healthcare with longstanding quality challenges: mental healthcare. The prevalence of mental health disorders and need for treatment continues to rise [94, 122, 141, 148]. Despite the need for high quality mental health services, there is a large gap between evidence-based practices and delivered care [64], and mental healthcare has been much slower to improve services compared to other healthcare specialties [114]. A recent review paper found that only 27% of published clinical reports describe adequate adherence to mental health clinical practice guidelines [13]. Data from the United States suggests that only half of publicly insured patients receive appropriate follow-up care after a mental health-related emergency department visit [131], and there are an inadequate number of inpatient psychiatric beds and/or care providers available to treat patients [6, 69]. Simultaneously, some psychiatric hospitals keep patients in care longer then medically necessary [126]. Researchers and policymakers have proposed that VBC programs could incentivize health systems to reduce these quality gaps by creating financial incentives to improve patient outcomes [58]. Given these challenges, in 2021, the National Committee for Quality Assurance, or NCQA – a leading organization in the United States responsible for assessing care quality – proposed a quality measurement framework for value-based mental healthcare [109]. With this proposal, the NCQA placed a shared responsibility on policymakers, health insurance companies, and healthcare providers to coordinate the definition, measurement, and management of quality. Through systems of joint accountability, the NCQA recognized that these different stakeholders must work together to improve mental health outcomes. But, the NCQA’s proposal stopped short of defining what mental health outcomes these programs should use, how outcomes data should be collected, and how accountability should be shared.

We therefore studied the design of HITs that support outcomes data storage, collection, and use in VBC with an important stakeholder in this process: mental health providers, specifically practicing clinicians. We focused this study on mental health clinicians since they will play an essential role in realizing VBC. Mental health clinicians use their expertise to decide what treatments patients receive, they collect data on patients’ progress in treatment, and collected data is transformed into the quality metrics that will determine how clinicians are reimbursed for their services. Thus, VBC holds clinicians financially accountable to make treatment decisions based upon specific quality metrics, and assumes that improving metrics will improve patients’ health. Mental health clinicians have found these new forms of accountability challenging. For example, less than 20% of mental health clinicians practice measurement-based care – the process of routinely collecting data to measure treatment outcomes and inform decision-making [47, 75] – citing concerns that data collection is burdensome, data collected within MBC do not effectively measure care outcomes across patients [11, 130], and data could be used punitively to influence clinicians’ pay [35, 86]. These tensions create opportunities to study with mental health clinicians what data better measure care outcomes, how this data should be collected, and the extent to which data could be repurposed to create more accountable care.

In this work, we contribute findings from a set of interviews with 30 U.S.-based mental health clinicians to explore the design space for HITs that support (1) outcomes data specification, (2) collection, and (3) use as a part of value-based mental healthcare. These three areas were inspired by Li et al’s stage-based model of personal informatics [87], specifically the stages of preparation, collection, and action, applied in this work to study the HITs and people (mental health clinicians) that will prepare, collect, and use outcomes data in value-based mental healthcare. Our findings center mental health clinicians’ perspectives in the design of HITs supporting value-based care. Specifically, participating clinicians advocated that HITs store functional and engagement outcomes for value-based mental healthcare, which were perceived as the proximal outcomes to provided services (Section 4.1); called for investments in HITs that reduce the burden of collecting outcomes data (Section 4.2); and believed that outcomes data would need to hold providers, health insurers, and social services jointly accountable to improve care (Section 4.3). We conclude with implications for research developing (Section 5.1) and designing (Section 5.2) HITs to better align stakeholders’ – including payers, clinicians, and social services – data needs for both VBC and patient care.

Related Work

2

Our work lies at the intersection of three lines of inquiry: research on technologies supporting health services (Section 2.1), mental health data collection and storage (Section 2.2), and value-based mental healthcare (Section 2.3).

Designing Technologies for Health Services

2.1

In this work, we studied technologies that support value-based care and the delivery of health services, which encompass the people, organizations, and technology involved in healthcare delivery [65, 121]. These people and organizations include healthcare providers, the clinicians or hospital systems that provide treatments or preventive care (the “services”); as well as healthcare payers, the government agencies or private health insurance companies that pay for health services. We review specific technologies supporting mental health services in Section 2.2. To design technologies for health services, we need to confront more than the hardware or software capabilities of a specific technology, or the effectiveness of interventions that use technologies to improve health outcomes. We also need to confront sociotechnical factors that affect the implementation and effectiveness of these technologies in real-world care. Norman and Stappers categorize sociotechnical factors that affect technology implementation as political, economic, cultural, organizational, and structural [110]. Blandford states that, for health services specifically, HCI scholars should “consider stages (of identifying technical possibilities or early adopters and planning for adoption and diffusion) that are rarely discussed in HCI, but that are necessary to deliver real impact from HCI innovations in healthcare” [18]. Thus, we were motivated to improve the design of technologies supporting health services by understanding factors that affect their implementation and adoption in care.

Recently, HCI scholars have considered adopting ideas from health services research to improve both the design and effectiveness of health technologies. Scholars have considered how HCI research can integrate aspects of implementation science – the health services field examining the real-world adoption of evidence-based interventions [90]. Interviews with HCI and implementation science researchers uncovered that HCI tends to de-prioritize factors that influence long-term adoption of technologies in their initial design, including the financial incentives that affect adoption, and an understanding of how technologies support providers after implementation [37]. Moreover, HCI scholars have stated that if technologies are to impact real-world care, HCI researchers should focus on how technology is consumed in care, including developing an understanding of the technical and market incentives to use new tools [28]. Inspired by this work, we considered these aspects of adoption in the initial design of technologies that support value-based mental healthcare. Specifically, we considered how technologies can support healthcare providers – practicing clinicians – including how these technologies can be integrated into clinicians’ workflows to support care, and the financial incentives that influence HIT adoption as a part of value-based care.

Health Information Technologies for Collecting and Storing Mental Health Data

2.2

HCI, health informatics, and mental health researchers have collaborated to build health information technologies (HITs) for collecting and storing mental health data. In this work, we focus on three categories of mental health data: clinical data, active data, and passive data. Clinical data can be retrieved from electronic health records (EHRs), which record information collected during clinical visits including patient demographics, diagnoses, health and family history, treatments provided, and unstructured clinical notes [16]. That said, to protect patient privacy, not all mental health data may be contained within the EHR, and exporting EHR data for VBC may require patient consent [84, 125]. Clinical data can also be retrieved from administrative claims databases, which log diagnostic, treatment, and medication information used to bill healthcare payers [34, 72]. Clinics or hospitals may also collect measures of patient satisfaction to understand patients’ perceptions of their care [24].

Active data require active patient or clinician engagement to be collected, and can be collected with technologies that support digital surveys (eg, smartphones, iPads, computers, patient portals) and pen-and-paper questionnaires. This data include validated self-reported measures of mental health symptoms, which quantify symptom presence and/or severity for specific mental health disorders, such as the PHQ-9 for major depressive disorder [80], or the GAD-7 for generalized anxiety disorder [127]. Active data can also include clinician-rated scales, collected during clinical interviews [8]. Outside of symptoms, self-reported and clinician-rated measures can also quantify functioning, as mental health symptoms can impair functioning including cognition, mobility, self-care, and sociality [134]. Self-reported measures can also quantify how well patients and their mental health clinicians collaborate towards shared goals, complete tasks, and bond, called working alliance [56]. The discussed scales typically quantify persistent symptoms or functional impairment. Researchers have used everyday devices, such as smartphones, to collect more in-the-moment symptoms via questionnaires called ecological momentary assessments (EMAs) [61, 138]. EMAs can also collect engagement data, measuring, for example, medication adherence, or participation in behavioral interventions, such as mindfulness exercises [78, 99]. Active data can be stored in clinical records, like an EHR, but significant investments have not been made to build structured EHR fields for storing active data [114].

In addition to active data, sensors embedded in devices (eg, smartphones, wearables) and online platforms have created opportunities to collect passive data – data collected with little-to-no effort – on behavior and physiology [108]. Passive data can be used to estimate signals related to functioning, including social behaviors, mobility, and sleep [100, 119, 120], and more recently, researchers have investigated if passive data can measure engagement in therapeutic exercises [42]. Prior work has also studied whether passive data can estimate symptom severity [1, 31, 32, 98]. The use of passive data in treatment is limited: while passive data can be collected within EHRs [9, 97, 113], established clinical guidelines for passive data use in care do not exist, and use is often limited to patients who are motivated to share passive data with their healthcare provider [108].

It is challenging to identify what mental health data are most relevant to HITs in certain contexts, given their variety. Li et al. proposed a 5-stage model to work through these challenges, specifically in the context of personal informatics systems, where users collect data for self-reflection and gaining self-knowledge. These five stages are preparation, collection, integration, reflection, and action [87]. In this work, we study how HITs can support mental health outcomes data as a part of value-based mental healthcare, inspired by three out of these five stages, specifically preparation, understanding what data to collect; collection, gathering data; and action, how data is used. We focus on these three stages because they capture existing challenges to design HITs that support VBC, which we review in Section 2.3.

Value-based Mental Healthcare

2.3

The World Economic Forum defines value-based care (VBC) as a “patient-centric way to design and manage health systems” and “align industry stakeholders around the shared objective of improving health outcomes delivered to patients at a given cost” [144]. VBC intends to change how healthcare is paid for, away from fee-for-service payment models – where payers reimburse providers for the number of services they provide – towards paying for services if they deliver “value” to the healthcare system [22]. In practice, VBC is implemented by paying providers a set rate for managing patients’ health, sharing savings if specific cost or utilization targets are met, and/or by offering financial incentives for payers and providers based upon quality measures, which quantify the “value” of care [57, 145]. These changes shift some of the financial risk of healthcare from payers to providers. In fee-for-service models, providers continue to be paid as they provide more services. In VBC, providers may lose money if services cost more than set rates, specific cost/utilization targets are not met, or if care quality suffers [57, 111].

Standardized quality measures guide payers and providers to deliver services that improve health outcomes and reduce cost. The Donabedian model categorizes quality measures into three areas: (1) structure – the material, human, and organizational resources used in care (eg, the ratio of patients to providers); (2) process – the services provided in care (eg, the percentage of patients receiving immunizations); and (3) outcomes – measuring the effectiveness of care (eg, surgical mortality rates) [5, 36, 38]. While structure and process measures are more actionable – hospital systems can hire more staff, or modify care practices – their relationship to outcomes can be ambiguous [115]. In contrast, outcome measures most clearly represent the goals of care, but can be biased by factors outside of providers’ direct control, including co-occurring health conditions that complicate treatment success [88, 115]. To reduce bias, statisticians apply a risk-adjustment to outcome measures, using regression to model expected care outcomes observed in real-world data, based upon variables known to moderate treatment effects [82]. The quality of provided health services for a specific patient can then be determined based upon whether a patient’s health outcomes exceed or underperform expectations.

Mental healthcare has faced specific challenges implementing VBC. Some of these challenges can be attributed to ambiguity on how to design health information technologies (HITs) that store outcomes data tying provided services to value [144]. Preparation challenges revolve around identifying standardized outcome metrics to store in HITs. Current quality monitoring programs incentivize using symptom scales as standardized care outcomes [101]. Patients often experience a unique constellation of symptoms that cut across multiple disorders (eg, major depressive disorder and generalized anxiety disorder) [11, 19, 30], making it difficult to identify a limited set of symptom scales to track outcomes across patients. Given these challenges, researchers have proposed using other data types as an alternative to symptom scales within VBC [58, 112]. For example, scholars and healthcare providers have argued that functional and engagement outcomes may be a promising alternative to symptom scales. Engagement is the proximal outcome of many mental health treatments, improved functioning is often more important to patients than symptom reduction, and functional outcomes measure treatment progress across patients living with different mental health symptoms or disorders [114, 128, 130].

In terms of data collection, it is estimated that less than 20% of mental health clinicians practice measurement-based care (MBC) – the process of collecting, planning, and adjusting treatment based on outcomes data – specifically symptom scales [47, 150], despite evidence that MBC improves outcomes [11]. MBC is usually implemented by having patients routinely self-report symptoms during clinical encounters using validated symptom scales, like the PHQ-9 for depression, or the GAD-7 for anxiety [146]. Mental health clinicians choose to not practice MBC for many reasons. Electronic health records (EHRs) often do not have standardized fields to support symptom data collection, clinicians perceive that symptom scale administration disrupts the therapeutic relationship, and clinicians are often not paid to administer symptom scales [35, 86, 112]. These barriers call for work centering mental health providers in designing HITs that effectively engage providers in outcomes data collection.

Action challenges stem from both perceptions of how outcomes data could be used in care, and challenges towards attributing accountability for care. For example, clinicians are often not trained to use outcomes data in care, and worry that they will be held accountable and penalized if outcomes data reveal that their patients are not improving [35, 86]. There are also concerns that outcomes data could be gamed: biased reporting that artificially inflates performance metrics [75]. In addition, it is difficult in mental healthcare to attribute accountability to specific actors (eg, specific providers) in care systems. Mental healthcare is often “siloed” from physical healthcare, though both physical and mental health outcomes are strongly intertwined (eg, individuals living with schizophrenia suffer from chronic physical health conditions) [114]. Thus, existing value-based mental healthcare programs may hold both physical and mental health clinicians jointly accountable by sharing cost savings across different types of providers [58].

Taken together, this prior work demonstrates challenges designing HITs that support value-based mental healthcare. Integral to the design of these HITs are mental health clinicians, who are asked to participate in outcomes data collection, which clinicians have found challenging, and will be held financially accountable to the outcomes data HITs store. Given these challenges, this work centers mental health clinicians’ perspectives on how to design HITs that support value-based mental healthcare. By centering clinicians’ perspectives, we looked to gain a deeper understanding of their workflows and incentives to adopt HITs, and integrate this knowledge into the design and development of HITs supporting value-based care. The following section details the methodology used in this study.

Methods

3

We conducted interviews with mental health clinicians to explore how they would design health information technologies (HITs) that support value-based mental healthcare. Methodologically, we were inspired by work in speculative design to imagine futures where VBC is mandated, and then brainstorm with participants how HITs could support VBC outcomes data storage, collection, and use [59, 142]. In this section, we detail the study procedures, including participant recruitment (Section 3.1), background information (Section 3.2), how data was collected and analyzed (Section 3.3), and our positionality (Section 3.4). All study procedures were approved by the coauthors’ institutional review board (IRB).

Participant Recruitment

3.1

We enrolled as participants mental health clinicians, specifically practicing psychiatrists, clinical psychologists, licensed clinical social workers (LCSWs), and licensed mental health counselors (LMHCs). We intentionally recruited providers from these different clinical orientations to gather different perspectives on designing HITs [96]. Participants were recruited via a combination of convenience, purposive, and snowball sampling [41, 51]. Specifically, a recruitment email and flier were sent to staff working at academic medical centers across the United States. Recruitment emails were often forwarded to providers who worked in smaller, private practices or community health settings, to help us gain perspectives from mental health clinicians working in diverse settings, treating different types of patients. Within the qualitative tradition [21], our goal for this work was not to gather perspectives representative of mental health clinicians as a whole, but instead to deep dive with our participants into the complexities of designing HITs that support VBC.

Participants’ Backgrounds

3.2

Table 1 summarizes background information for the 30 mental health clinicians who participated in the study. This background information was collected during an intake survey, which was administered after participants provided informed consent for our study. Apart from data collected within this intake survey, we often asked participants during our study interviews to provide background information regarding their current payment arrangements. Most of our participants took traditional, fee-for-service payments (public and private), or asked their private practice patients to pay for care out-of-pocket. A few participants (eg, SW28) worked in health systems transitioning to value-based payments. Many participants were unfamiliar with VBC.

Data Collection and Analysis

3.3

All participants were asked to provide informed consent after being provided complete information about the study procedures. Interviews were held via Zoom over two 1-hour sessions attended by the first three authors, and participants were reimbursed $30 per hour for their time. The first session was a semi-structured interview where we asked clinicians about their current care practices, specifically how they used data – defined broadly, collected with or without technology – in care. We specifically asked participants about their perspectives on measurement-based care (MBC), the practice of collecting and using data in care that would power HITs supporting VBC [75]. We then asked participants further questions about how they used this data to measure care outcomes, how technology was involved in this process, and whether providers were accountable to achieve certain care outcomes. Interview questions were broad to allow for on-the-spot adaptation and probing [12].

In the second session, participants completed two design prompts. These prompts were motivated by work in speculative design [59, 143], to imagine futures where MBC and VBC were mandated and to understand how clinicians would collect and report outcomes data as a part of these programs. The first prompt asked participating clinicians to imagine a world where they were mandated to use outcomes data as a part of care, and to brainstorm what data they would prioritize. The second prompt was motivated by the five-star quality rating system used by the United States Center for Medicare & Medicaid services (CMS) [25]. Participating clinicians were asked to imagine that as a part of VBC, CMS wanted to design “mental health quality star ratings” to measure patient outcomes and care quality across clinics and health systems. Participants were asked to brainstorm what data should be included in this new star rating program. After responding to each prompt, we discussed with participants the data they included in their responses, and asked probing questions to further understand how HITs could support data storage, collection, and use. Full interview guides can be found in Appendix A.

Interviews were recorded with participants’ permission, transcribed by a professional service, and de-identified. Transcripts were analyzed using a reflexive thematic analysis approach adopted from [20]. This approach combined both inductive and deductive elements. Codes and themes arose from the data, but were guided by our research interests and the literature [21], specifically the stages of preparation, collection, and action from Li et al. [87]. The first author qualitatively coded all transcripts. Codes were iteratively refined, resulting in a final codebook, and all transcripts were recoded using the final codebook. Themes were developed from the codes by the first author, with support from the second and third authors who also participated in the interviews and validated that the themes represented participants’ views. The codebook used to generate each theme can be found in Appendix B.

Positionality

3.4

The first, second, and third authors are graduate students in computer and information science. These authors recruited participants, collected, and analyzed all of the data. One author is a clinical researcher and practicing mental health clinician who worked with the first author on the study protocols, and did not participate in the study. Another author is a health policy researcher, who is an expert on both digital mental health and value-based care. The final author is a researcher in computing and information science. All authors were based in the United States, and thus our findings and perspectives are greatly informed, and potentially limited by, our knowledge of the United States healthcare system.

Findings

4

Our findings highlight three themes exploring the design of health information technologies (HITs) that support outcomes data preparation, collection, and use within value-based mental healthcare. (1) With regards to preparation, participants preferred that HITs store functional and engagement outcomes for VBC, as compared to symptom scales or other outcomes data, because they believed functional and engagement outcomes were most directly tied to provided mental health services (Section 4.1). (2) Participants also perceived that data collection could be improved by investing in HITs that support standardized fields to collect mental health outcomes data, and saw opportunities for devices collecting both active and passive data to improve data collection (Section 4.2). (3) Finally, participants emphasized that actions with outcomes data must hold payers, providers, and social services jointly accountable to care outcomes, and outcomes data need to be risk-adjusted, otherwise providers may prioritize easier to treat patients that inflate outcome metrics (Section 4.3). Throughout our findings, participants are referred to with a unique identifier (eg, CP30) to maintain anonymity. These identifiers indicate participants’ clinical training (CP = Clinical Psychology, PS = Psychiatry, SW = Social Work, MC = Mental Health Counseling, FT = Family and Marriage Therapist, see Table 1). Participants referred to individuals receiving mental health services as both “patients” and “clients”, and we use these terms interchangeably in our findings.

Preparation: What Outcomes Data Should HITs Store?

4.1

A foundation of building HITs for value-based mental healthcare are determining the standardized outcomes data these technologies should store. Participating clinicians recognized the value of standardized outcomes data. As SW28 mentioned, “I think we have to have some concrete thing that’s going to say, ‘You’re getting better. This treatment is working.’ ” But, participants believed it would be challenging to identify a limited set of outcomes data to use for VBC, even for a single patient or within a single disorder. Participants mentioned how patients often present in care with multiple symptoms co-occurring across disorders, and collecting data to track all of their symptoms was burdensome: “You can’t ask questions about absolutely everything. Sometimes you find the patient talks about how they’re anxious about their parents, their family and their friends. I give them a longer anxiety scale that hits social anxiety, school anxiety, separation anxiety. But I’m being forced to do all of these assessments and I’m not getting a really good reason why other than because you have to.” (SW38)

Given these complexities, we weighed with participants what outcomes data they preferred to use within VBC, and identified two themes. First, drawing upon their clinical experience, participants believed that symptom scales – for example, self-reported depression scales – would be difficult to use. Participants described that symptom scales were difficult to interpret across patients, and did not accommodate patients who identify with different cultural backgrounds (Section 4.1.1). Instead, participants preferred using a combination of functional and engagement outcomes data that they believed better reflected patients’ goals for care, were more closely connected to treatment, and were relevant across patients presenting with different disorders or symptoms (Section 4.1.2). By functional data, participants referred to data that quantified a patient’s ability to participate in day-to-day life, including their cognition, mobility, ability to work, and maintain healthy relationships. By engagement, participants referred to patients’ engagement in treatment, including their ability to practice skills or behavior change exercises learned in care, take prescribed medication, or make safety plans for harmful (eg, suicidal) behaviors. Participants mainly imagined forms of data captured within clinical encounters. We discuss in Section 4.2 participants’ perspectives on using data captured both within and outside of clinical encounters for VBC.

Challenges Using Symptom Scales as Outcomes Data in VBC.

4.1.1

We began our interviews asking participants about using standardized symptom scales as outcomes data, as existing quality metrics and measurement-based care programs advocate for collecting standardized symptom scales [67, 101]. These scales quantify symptom severity for specific mental health disorders, and include self-reported symptom scales such as the PHQ-9 for major depressive disorder, or GAD-7 for generalized anxiety disorder. Symptom scores are added together to provide an overall measure of treatment progress, and could be shared with regulators as outcomes data in VBC. Participants believed symptom scales were useful for communicating patients’ diagnoses for “insurance repayment” (SW55) because they give “a common language” (CP51). CP51 also believed symptom scales were useful for understanding if “there are specific clusters of symptoms coming together to understand if I have an intervention that targets those symptoms” (CP51). But, our participants were uncomfortable using symptom scores as outcomes data within VBC, because patients have challenges interpreting and reporting symptoms. SW58 explained: “A score on the PHQ-9 can get worse because of external factors. Somebody loses a job, gets a divorce, their child is sick, these things can happen that make stress or depression feel much harder to deal with. But it doesn’t mean that the client is getting worse.” (SW58)

We further probed participants about factors that distort symptom scores. For self-reported scales, some participants described internal and environmental factors that affect self-reporting. SW37 mentioned how patients may have “a literacy issue and do not fully gather the meaning of all the questions” or “do not feel comfortable fully disclosing their answers. We will see a discrepancy sometimes between how they fill out the form with their doctor and how they fill it out as a mental health professional.” CP30, a child and adolescent psychologist, stated that some children “just tend to kind of rate symptoms on the higher end.” One participant, a psychiatrist, described their own reporting behaviors to explain why symptom scores are difficult to interpret at face-value: “If I were to take a PHQ I would probably score highly on it, not because I’m depressed, but because when the questions say ‘You spend a lot of days not wanting to get out of bed,’ or ‘You overeat,’ and I’m like, ‘Yeah, I do, but I’m not doing it because I’m depressed, I’m doing it because I am lazy.’ The context is important. The hard numbers taken out of context aren’t fully accurate.” (PS25)

Aside from self-reports, our participants also mentioned how it was difficult to interpret clinician-rated symptom scales. SW38 would give patients “baseline assessments and my colleagues would be like, ‘Oh my god, the patient is so depressed. We have to give them this really extreme, very intense treatment.’ ”, but the participant challenged their colleagues to see symptom scores as “a piece of a whole picture.” Participants described how they would cross-reference symptom scales with other providers to improve their understanding of patients. One participant, who treated patients with emergent psychotic symptoms, stated that they spend “30 minutes dissecting what patients have said in different providers’ offices, trying to figure out if they’ve crossed the threshold to a first psychotic episode” (CP35). Another participant mentioned that providers would report, for the same patient, different levels of symptoms quite frequently: “There’s been multiple times where I am rating somebody at lower risk and another clinician rates them at higher risk and it’s the same day and program. What do you do about that? How do you work? How do you provide the right treatment?” (SW28)

Participants also believed that symptom scales did not accommodate patients from different cultures. One participant mentioned how “there is stigma around sharing one’s mental health and so in the hospital system where I work, there are people from so many different cultural backgrounds”, and that symptom scales would be “aligned and more accurate for people who are open and coming from a cultural background where there’s open discussion of mental health and symptomology” (SW37). Another participant, described that symptom scales were not developed inclusively, and the “evidence-based is pretty self-selecting” (CP34). We probed this participant further to understand how this might impact using symptom scores as outcomes data: “I feel mixed about this because the way we have developed these measures and the people on whom they have been developed for. They’re just not always accurate, inclusive, or culturally appropriate. I don’t really see a world where they fully capture the clinical picture for somebody.” (CP34)

Participants Preferred using Functional and Engagement Outcomes Data.

4.1.2

The prior section describes various challenges participating clinicians saw using symptom scales as outcomes data supporting VBC. Given this, we asked our participants for their perceptions regarding alternative types of outcomes data that could be used. Some participants mentioned using measures of patient satisfaction, but perceived that satisfaction could be biased by aspects of care unrelated to health outcomes, such as “if the patient likes the hospital’s food” (PS23). We also asked our participants if care utilization data extracted from EHRs – such as psychiatric hospitalizations – were useful outcomes data, but participants were wary to create a culture that discourages utilization: “If I’m being measured on how many of my clients go to the emergency room, I don’t care. Not that I don’t care, but, if going to the emergency room was the best decision for that client, what am I going to do?” (SW50). Other participants brought up working alliance scales – that measure the patient-clinician relationship – but described alliance not as an outcome of care, but an important aspect “at the beginning of care because you’re trying to get that buy-in for treatment” (CP51).

Instead, participants advocated for using a combination of functional and engagement data as outcome measures, and saw these data types as more aligned with patients’ care goals (examples in Table 2). CP30 mentioned how impaired functioning was “why a lot of people seek treatment. They feel like something is messing up their life in some way. Their goal is to be able to go to school, hang out with friends, spend time with family, whatever it is.” Another participant, who treated individuals living with obsessive-compulsive disorder (OCD), found that “symptom relief itself is not terribly motivating for most people. If you are hamstrung by fears of household chemicals, nobody wakes up in the morning and says, ‘Oh, boy, I can’t wait to get used to these household chemicals.’ ” but instead they “really focus more on functional gains” and “my goal is to get somebody out of the house and interacting with friends” (FT60).

We asked participants to explain why functional improvements were not captured by symptom scales. In other words, if functioning improves, why should we not expect symptoms to decrease? Participants explained that symptom reduction was not the singular outcome of treatment, but treatment intends to improve functioning even if symptoms persist. For example, PS24, a psychiatrist treating patients living with schizophrenia, mentioned how they “tend to have very chronic patients where the goal isn’t to get rid of symptoms, but the goal might be to make symptoms interfere with their life less” and their patients may “have ongoing voices and paranoia, but they’ve gotten to the point where they’re able to ignore the voices and attend work.” SW58 agreed, stating that “when I think about somebody that experiences psychosis or bipolar disorder or depression, you may have this for your whole life. If the goal is to have fewer symptoms, am I setting you up to fail from the start?” and they work with patients to understand “given what your life is, how do you want to live? Maybe there’s specific things, maybe you want to go back to school.”

Furthermore, participants believed that functional outcomes were likely to improve if their patients engaged in care, and saw engagement as the most proximal outcome of care. For example, many of our participants were psychotherapists who asked patients to practice specific skills or change behavior as a part of treatment. FT60 mentioned how they “had somebody who was washing their hands a hundred times a day and driving his family nuts with accommodations” and they had their patient “use judicial safety behaviors to play with his daughter who’s crawling around on the floor and then take a shower afterwards.” A few months into treatment the “patient is still washing his hands a hundred times a day, but he and his family are tons happier than they were. They’re raving about how well they’re functioning and working together now” (FT60). CP45 mentioned how for patients with “panic disorder, I’d want to have some behavioral data on what they are avoiding or how frequently they are getting out of the house, depending on the specifics of that person.” Another participant mentioned how, by tracking engagement, they might feel more confident that “someone having passive suicidal thoughts would have no intent to act on them” because they “have a supportive family that they communicate to and a safety plan in place” (CP46). CP43 saw treatment as successful if patients consistently engaged in care, even if symptoms were not fully reduced: “Their [symptom] scores do cut in half, but don’t move much beyond that and stay relatively stable. If they practice their skills and those are well-developed, they got what I am aiming to provide for them. Sure their scores aren’t zero, but that might just be because of their personality, environment, social context.” (CP43)

Unlike symptom scales, participants saw functioning and engagement as “trans-diagnostic” (CP42), measuring care outcomes across patients experiencing different symptoms or disorders. FT60 qualified that measuring symptoms were not irrelevant, but called for “a shift from symptom-focused metrics to patient-focused metrics, which can include the symptoms.” CP33 wanted to prioritize engagement outcomes for complex cases, giving an example of “a patient who had comorbid substance use disorder, PTSD [post-traumatic stress disorder], borderline personality disorder, there’s a lot of suicidality, a lot of very, very intense mood, depression and anxiety” that “those intense things, really intense urges, really intense depression, that didn’t go away” but the patient “developed trust and she kept coming to therapy. She missed, maybe, four sessions all year. Those are therapeutic gains. She internalized some hope that progress is possible.” CP30 further explained the importance of engagement and functional outcomes across conditions: “If someone has one depressive episode, they will likely have another episode. Someone who has generalized anxiety disorder may always be a more anxious person. Someone with obsessive-compulsive disorder may always be vulnerable to intrusive thoughts. It doesn’t mean they’ve failed treatment if they can tolerate the anxiety or cope with the depression, go to work, get out of bed, shower, do the things you have to do, using the skills you learned in therapy.” (CP30)

Collection: How Can HITs Support Outcomes Data Collection?

4.2

Our first set of findings describe that participants preferred HITs prioritize storing functional and engagement data as outcomes in VBC. We then explored with participating clinicians how this data should be collected. Our related work suggests that existing mental health data are not collected by clinicians because they perceive scale administration as burdensome, administration takes time away from treatment, and mental health clinicians are not always trained to use outcomes data in care. Participants affirmed that data was not collected, stating that “it’s hard to get the buy-in from clinicians who don’t have that initial training if they have no reason to do it” and “if you are a clinician who does not care and doesn’t have buy-in, it’s really easy to let it slide off” (CP35). PS25 stated that “it feels like it’s hard to seamlessly integrate scales into a session.” Another participant stated that data collection is “extra work and we’re not getting paid for it” (CP48).

Given these complexities, we asked participants about the barriers they saw towards engaging clinicians in data collection, and how HITs could improve outcomes data collection. First, our participants described existing challenges using HITs to collect and manage outcomes data (Section 4.2.1). Specifically, they described that current clinical data infrastructure, namely electronic health records (EHRs), were not designed to collect mental health data, and it was difficult to acquire funding to improve data infrastructure. Participants also believed that to engage clinicians in VBC data collection, any mandated data collection should be client-specific, easy to administer, and relevant to decision-making. In addition, participants saw opportunities (Section 4.2.2) for active and passive data, collected via devices (eg, smartphones, tablets, wearables), to improve engagement in data collection. But, participants believed that for passive data to be used as VBC outcomes, there would need to be evidence demonstrating that passive data accurately measures the outcomes of care. Participants also raised practical challenges towards active and passive data use. Participants wanted to understand who would pay for data collection devices and clinicians’ time spent interpreting data, how this data would be integrated into clinical records, and were concerned that prioritizing data collected via devices could increase inequities in care.

Existing Challenges Collecting Outcomes Data.

4.2.1

We probed participants to understand current challenges towards engaging clinicians in outcomes data collection. Participants described challenges with existing HITs, specifically the electronic health records (EHRs) used in clinical settings to store and share patient health information, including outcomes data in VBC. Many participants explained how EHRs were not built for mental health data collection. For example, CP35 mentioned that “in the medical center, there’s just so much red tape. I know they’re trying to integrate outcome measures into our EHR system, but it’s so challenging.” Participants often operated outside of the EHRs used by other clinicians in their health system. Therefore, they would “transcribe scales in the EHR ourselves”, so that other care providers could access patients’ mental health data, but this was “so open to human error of typing in the numbers” and thus “while I want scales to be written in the discharge summary paperwork other providers are getting, that doesn’t always happen” (CP46). SW49 had personal experience advocating for investments in mental health data collection infrastructure. Before becoming a clinician, they had worked as the director of data analytics and research at a health system serving patients living with eating disorders. They described: “We had built data infrastructure, we were getting data on a weekly basis for all of our own patients across the system, and we were just starting to really incorporate that data into treatment planning and reporting data at the end of care to various stakeholders. I was there for three and a half years, and then my position was eliminated in a merger. This is an unfortunate aspect of mental health in particular, and especially in the eating disorder space. The margins are really thin and nobody wants to invest in data.” (SW49)

Given these challenges, many of our participants chose to not use EHRs for data collection. For example, SW49 described how their clinic would use “analog means (eg, recording and sharing measures on paper), which were terrible, but the analog means were more successful.” Some participants, specifically those in private practice, could not afford EHRs, and used other software for collecting and managing data. CP34 described that they “have an Excel sheet for every client” and “I just plug scale scores in, and then I graph them over time.” Another participant, SW17, practiced a form of psychotherapy, called dialectical behavioral therapy (DBT), that requires patients to fill out detailed “diary cards” before each clinical encounter. They described how “for every patient we do an EHR note” but “DBT ends up being very detailed. I only put general data in the body of my EHR note. But for the actual diary cards, I give patients a paper binder and I keep the binders in my office in a locked cabinet” (SW17). Since clinicians do not enter detailed data into EHRs, the data entered into the EHR were often incomplete, missing important information from clinical encounters. One participant mentioned how missing data could be harmful: “A lot of people get lazy about using the EHR and might just type in their note something general like, ‘the score was this’ but not actually record all the individual answers. It’s important to know where people scored on specific symptoms. For example, on a depression scale, you want to know, are people scoring really highly on suicidality?” (PS25)

The prior paragraphs detail challenges but also the necessity to invest in and design HITs for collecting and managing mental health data, supporting data sharing between clinicians and other stakeholders as a part of VBC. Aside from improving these HITs, participants also described that they would be more likely to engage in outcomes data collection if data were more effectively tied to care. For example, SW38 described how mandated data were often not relevant for their patients. They stated that “some of the assessments I have to do because we’re a community-based clinic, but I don’t really want to ask a 15-year-old about their heroin use habits if that’s not something that is relevant, but I have to.” Another participant worried about the burden of outcomes data collection, stating that “if CMS were to require this data, it’s important to do an audit of clinicians’ other paperwork requirements when they consider the cost of adding these measures” (CP42). Therefore, to encourage data collection, participants wanted scales to be client-specific, easy to administer, and relevant for decision-making. CP34 stated that the scales they choose to use are “a jumping off point for interventions. They take very little time, they probably take 30 seconds at the beginning of every session. And they’re just really integrated into each session as a way to check in and give the person ownership over what they feel like is bothering them the most.”

Opportunities for Technology to Improve Outcomes Data Collection.

4.2.2

Given these challenges, we brainstormed with participants how functional and engagement outcomes could be patient-specific, easy to administer, and integrated into care. Participants raised how they used technology to collect patient-specific active data relevant for care. CP44 stated that they “had clients FaceTime loved ones for meals” to demonstrate engagement in care. Another participant mentioned how “if a patient’s goal was, ‘I need to exercise more’ ” their patient “could take a photo at the gym” (SW37). Other participants thought active data collection could be difficult to enforce. CP34 stated that “in most of my training it’s been hard to even get people into treatment. I did some EMA data collection in grad school with people who were using substances, and it was just really, really challenging.”

Some participants identified opportunities for passive data to reduce the burden of outcomes data collection. For example, PS23 described how, for their patients, “panic is probably the one thing that you can see a lot of for PTSD, where you end up having physiological stress from your illness. A wearable will show an increase in heart rate, an increase in blood pressure, perhaps an increase in sweating, breathing, and respiration rate.” Another participant wrote that activity data could be useful “because whether we’re talking about somebody who’s depressed or we’re talking about somebody where there’s some health adherence problems, let’s say it’s following a cardiac healthy lifestyle or even anxiety, physical activity may be relevant” (CP45). PS53 was interested in collecting language data because “language is what we use to treat and diagnose.”

Despite participants’ interest in passive data, they did not believe that passive data, at face-value, could be an outcome measure in VBC. Even though passive data could make data collection easier, participants saw that interpreting passive data could be challenging. CP42 stated that “I have no training in interpreting sleep data to know what’s normal versus not.” Another participant mentioned how “on the physiological data, I thought about going back to discrepancies between what patients say or how they’re behaving with me. I would need to reflect on, either with them or by myself with my supervisors, and say, ‘What does this data mean?’ ” (PS53). Participants also raised privacy concerns with passive data collection, stating “patients have to be down with the device collecting the data and a clinician seeing all of that data just from a privacy perspective” (CP42) and for language data that there would be “some resistance, from clinicians actually more than patients being recorded and things like that. It can be high-liability information” (PS53).

Another consideration regarding the use of passive data in VBC was validation: that passive measurement tools are able to accurately measure care outcomes. For example, if there were a VBC functional outcome focused on quantifying sleep improvements, participants wanted assurances that devices could accurately measure sleep. CP35 mentioned that their clinic chose to use more expensive research-grade actigraphs, versus consumer activity monitors, because they believed that consumer devices were not as well validated. Specifically, they said that “we used actigraphs. And not just like your Apple Watch, but well-validated actigraphs, because I learned your smartwatches are not well validated for telling you if you’re sleeping when you’re supposed to be sleeping” though “it would be a lot easier if you could use what somebody is already wearing” (CP35). Another participant was unwilling to use passive data, and would prefer using self-reported scales in the absence of rigorous validation: “I’ve been intrigued by the promise and disappointed by the execution of devices. I hear from sleep researchers that unless you’re getting a very expensive device that’s really closely tracking you, your Apple Watch is not doing a great job estimating how deep is your actual sleep. It’s probably capturing general trends. Unless the technology improves, I’d be really okay with just having a self-report.” (CP43)

Participants raised other practical challenges towards integrating both active and passive data into VBC. CP35 raised that the need for payment mechanisms to reimburse for devices and interpreting data, stating that currently “we didn’t bill separately for the watch. The clinic operated at a loss if the patient didn’t bring the watch back.” Another participant wanted sleep data to be integrated into existing health records, stating “I would love if a portal integrated sleep data” (CP42). Other participants worried that an over-reliance on devices for active and passive data collection would cater to higher resourced individuals who could afford devices and share data. CP34 stated that “the quality of the data really does, in my opinion, skew sometimes towards the higher resourced individuals” and SW28 mentioned that “there’s value in behavioral markers. But, in our setting, which is a hospital, a lot of patients are not going to have an Apple Watch or a smartphone. Patients don’t even have WiFi.”

Action: How Should Outcomes Data be Used in VBC?

4.3

The prior section suggests opportunities to invest in HITs, and use active/passive data to improve clinicians’ engagement in VBC outcomes data collection. Once outcomes data has been collected, VBC programs use this data to create financial incentives that hold providers accountable to achieve specific care outcomes. Participants, generally, recognized the need for more accountability. One participant explained that “there are incentives to keep your patient caseload the same when you’re in private practice, because it’s a lot of work to do intakes, and you get comfortable with the people you see. And so, if there’s a piece of your reimbursement that’s tied to meeting an outcome and then discharging and starting anew, it also holds you more accountable” (CP35). Participants also raised that VBC could give patients more control over care decisions. FT60 stated that “many providers convince patients that they’re failing treatment” and CP35 continued: “it’s really hard to know, as a consumer, whether or not you’re seeing somebody whose skills actually back up what they say. Value-based care could help you steer whether or not you go to somebody.”

In this section, we describe both challenges and opportunities participants’ perceived towards using outcomes data in VBC. First, participants stressed that outcomes data would need to hold providers, healthcare payers, and social service organizations jointly accountable if VBC were to fulfill its promise of improving mental health outcomes (Section 4.3.1). Second, participants voiced the need for HITs to implement risk-adjustments to outcomes data, otherwise clinicians may prioritize treating simpler patient cases that inflate care outcomes (Section 4.3.2).

Using Outcomes Data for Joint Accountability in VBC.

4.3.1

After participants brainstormed what outcomes data HITs should store, we asked them how this data should be used in VBC programs to improve care. Specifically, we were interested in who should be held accountable: payers, providers, or other entities? Participating clinicians quickly pointed out that it would be very difficult to attribute the outcomes of care to any one specific entity. Though providers would love to take credit if outcomes did improve, that was not always possible: “I don’t really care what’s causing the improvement. If that’s due to my intervention, great. And if not, still great because they’re feeling better. But I think that tying outcomes to something very specific is too complex. There’s too many extraneous and confusing variables to ever do that.” (CP51)

Participants gave many examples highlighting the need for joint accountability, where responsibility for care outcomes is shared across different providers, or external entities that influence mental health. For example, CP46 worked in an adolescent inpatient psychiatry unit treating patients in crisis. In their view, crisis care may not translate into long-term outcomes, explaining that “patients may feel totally better because they’re in the hospital removed from all the stress and problems of life. Once they leave, I suspect their symptoms would increase. One week or month here doesn’t solve the patient’s way of approaching life” (CP46). To solve this challenge, PS23 believed that clinicians providing inpatient and outpatient services – the long-term care patients receive after discharge from inpatient – should be held jointly accountable: “you could look at across the system, but may not be able to look at for the individual provider. From the patients with depression that we treat as inpatients who then go to our outpatient setting, 85% still meet criteria for remission after one month. That tells us, okay, we as a health system are doing something right.” PS53 believed that physical health providers should be accountable for mental health outcomes. They stated that “as I’m doing community psych, I’m learning more that outcomes involve physical care, especially if people can’t move around, so we need integration” (PS53).

Outside of holding providers jointly accountable, participants also described how external entities greatly influenced care outcomes. One participant raised how health insurers should be held accountable, because “insurers reimburse clinicians so poorly, so the care is not going to be high quality a lot of the time. It’s almost like this circular reasoning issue” (CP34). Another participant thought that social services, for example housing or education authorities, should also be held accountable for poor mental health outcomes. They stated that “in settings like city hospitals, I wish there was more measurement around what interventions have been effective in reducing stressors related to housing, food, and educational access” (CP34). Because social factors, like housing, could influence care, some participants described taking a more active role in linking patients to social services. SW57 mentioned how “some patients might be looking for mental health housing, so I’ll say, ‘listen, if you want to bring the paperwork into your next session and spend your session with that, we can absolutely do that.’ ” CP46 and PS23, though, both expressed frustration that social factors could lead to poor care outcomes, and providers, not social services, may be held accountable. As they expressed: “A patient could be in foster care, lose their foster home and become stuck in inpatient care for two months. That outcome, the length of stay, has nothing to do with how much better they were and everything about the systems serving them.” (CP46)“We get frustrated when we see a 30-day readmit but then we understand the patient is homeless and it’s 30 degrees outside and someone stole their medication.” (PS23)

Risk-Adjusting Outcomes Data to Encourage Fair Treatment.

4.3.2

To penalize and reward stakeholders within VBC, participants voiced that the outcomes data HITs share with payers or quality monitoring organizations need to be risk-adjusted: adjusting expected care outcomes based upon the difficulty of the patient case. Otherwise, participants believed that VBC could dis-incentivize clinicians from treating tougher cases to inflate outcome metrics. CP42 stated “how long one person needs treatment differs from how long another person needs treatment. And having a strict outcome can mean that people aren’t getting enough treatment.” Another participant echoed this concern, saying “it’s kind of unfair if you’ve got someone treating more severe people to be like, ‘Oh, you suck at your job because you couldn’t get your people down to that level.’ So the challenge is, what do you want your outcome to be?” (CP43). FT60 put these terms more starkly: “I’m extremely nervous about the impact on care because it’s going to turn the clinician against the patient in favor of boosting their scores. I’m fine with outcome measures being exposed to the consumer so that they can make an intelligent decision as to where they want seek care. But I have real concerns about using them for reimbursement criteria or access to care as a consequence.” (FT60)

We approached our participants to understand how HITs should perform risk-adjustments to reduce these potential harms. If outcome metrics were symptom or functional scores, one participant thought about using “individual changes or change scores rather than absolute zeros. A person who comes in really significantly depressed who moves mild to moderately depressed [on a symptom scale] is a big change. But if you just use absolute scores, that might not reflect that treatment works” (CP35). CP42 suggested that change scores should be relative to patients’ baselines, stating that “if one patient’s symptom severity were a 10, but they came into me starting at a 24 that would show amazing improvement. Whereas if someone’s usually the happiest person in the world and they’re now feeling a little depressed, that might be a notable change” (CP42).

We also asked participants about specific factors risk-adjustment models should account for to estimate expected care outcomes. PS32 thought models should account for other conditions patients are living with, stating that they can affect mental health outcomes data: “I have a patient who needed to have bariatric surgery because it’s hard to manage their appetite. They have sleep and energy problems that are related to the chronic pain and fibromyalgia. All those other conditions besides depression already bring the PHQ pretty close to a 10, if not higher than a 10” (PS32). Another participant mentioned that models should account for patients’ history of mental illness. Specifically, that they “tell people starting on a medication that if this is the first time they’ve been treated for depression or anxiety, I recommend once your symptoms have been alleviated that you stay on the medication for about six months. If they come off the medication at that time there’s a 30% risk of relapse. If this is your second time with an episode of depression or anxiety, the chances of a third relapse are 70%. The goalposts get moved a little bit” (PS23). In addition, PS23 also raised how social factors influence care outcomes, because “in the last place I worked, 70% of our patients were homeless. These are people with a high level of needs, a high level of trauma and stress.” CP33 agreed, saying that “I wouldn’t expect someone who’s coming in with a really severe depression, who has multiple stressors, and maybe less resilience factors, fewer support system, all kinds of things, to come out of that at the same place as someone who has had a relatively supportive and stable household.”

In addition to using these factors to moderate expectations, participants also believed these factors should moderate the expected length of treatment, because “I wouldn’t necessarily expect progress to happen in the same way or in the same timeframe” (CP33). One participant stated that “obviously we hope that all of our clients improve. Maybe, over a longer period of time we start to see improvement because you had a long period of therapy. But I don’t know that timeframe” (CP44). Yet, not all participants were convinced that patients need to be in care for a long period of time to see improvement. As one participant stated: “I would expect change. I would 100% disabuse the notion that you need to be in therapy for years to see progress and instead show that a lot of people can get better from just a few sessions.” (CP43)

Discussion

5

Our findings center mental health clinicians’ perspectives on how to develop HITs that support both the goals of value-based care and providers’ individual care needs. In this discussion, we describe the implications of these findings towards future research developing (Section 5.1) and designing (Section 5.2) HITs supporting value-based care. These implications, contextualized with our findings, are summarized in Table 3.

Developing HITs that Support Value-based Mental Healthcare

5.1

Developing Data Infrastructure.

5.1.1

Our findings in Section 4.1 advocate for developing HITs for VBC that store a suite of symptom, functional, and engagement data. Within this suite of outcomes data, how do we distill what data is most useful? VBC programs could allow for a bundle of interlinked outcomes data: a set of data types validated for VBC, of which clinicians and patients can select a subset to monitor based upon individual care needs. This “bundle of data” is motivated by the concept of a personalized data pipeline from personal informatics literature, where individuals have control of what health data they collect and monitor, as well as how this data is analyzed based upon their specific goals [29, 76, 140]. The data used for VBC could also evolve over time as care needs change [4, 26, 124]. For example, a standardized symptom scale could be used at the beginning of care for screening, diagnosis, and treatment selection. Engagement and functional outcomes could then be used to monitor progress in treatment. To be specific, a clinician may administer a Y-BOCS – a standardized symptom scale – to screen a patient with suspected obsessive-compulsive disorder (OCD) [52]. Then, using examples from Table 2, a clinician could engage a patient in behavior change exercises to reduce obsessive behaviors, like handwashing. Exercise engagement could be monitored by having a patient self-report how often they wash their hands, or by using a wearable device that passively senses handwashing. The objective of reduced handwashing could be to improve a specific functional outcome important to a patient, such as their ability to spend time with family and friends. This functional outcome could be tracked using selected questions from a functional measure, like the WHODAS [134], or using passive data on phone use and communication. This example shows how a bundle of data could give patients and clinicians the flexibility to develop personalized VBC outcome data pipelines that are most meaningful for care.

Yet, if clinics collect different data, how do we enable a standardized data sharing infrastructure? Individual clinics may choose to collect certain types of outcomes data for VBC based upon device constraints, scale administration infrastructure, or patients served. In addition, patients’ data sharing preferences may limit what data can be extracted from EHRs for VBC [84, 125]. From a technology perspective, this calls for building federated HITs, where data is securely collected and stored locally at a hospital or clinic, and VBC program administrators (eg, health insurers, the government) only access specific data types through a virtual repository, with patients’ permission [14, 89].

From a sociotechnical perspective, clinicians and health systems should better engage patients in what data is being shared with program administrators, and the benefits of collecting and sharing data for VBC [81]. Processes such as dynamic consent could engage patients on what data is being shared and with whom, as collected data types change over the course of care, or health insurance coverage changes impact who data is shared with [73, 108]. This calls for continued research on consentful interfaces that better elicit patients’ preferences on health data sharing and use [63, 102, 133]. By negotiating data use, patients may be more comfortable participating in VBC data collection despite the additional data work. In addition, this data collection could empower patients with data to show their clinicians what care decisions are working or not working. Further research with patients is needed to explore preferences for data sharing and use as a part of VBC.

Developing Passive and Active Care Outcomes.

5.1.2

Our findings in Section 4.2.2 suggest collecting symptom, functional, and engagement data both passively and actively. A decade of research in human-computer interaction, ubiquitous computing, and digital mental health has studied how a combination of active and passive data can be used to measure behavioral and physiological signals associated with symptoms of mental illness. This research has focused on conditions including depression [2, 105, 147], anxiety [32], bipolar disorder [49], and schizophrenia [138, 139]. Our findings advocate for continuing this behavioral tracking work, centering how passive and active data can measure functional and engagement outcomes contextualized to specific interventions in care. For example, Evans et al. recently explored how passive data can measure engagement in therapeutic exercises for treating post-traumatic stress disorder [42]. Many other clinical interventions involve engaging in behaviors that could also be measured with a combination of active and passive data, including medication adherence, reducing avoidance behaviors [60, 68], or regulating sleep and wake cycles [48].

One challenge that arose from our findings is that participants preferred using more expensive, research-grade devices to measure behaviors related to functioning (Section 4.2.2), for example, fine-grained sleep. Participants believed that research-grade devices were more rigorously validated than less-expensive and more ubiquitous consumer devices. From an equity perspective, absent reimbursement mechanisms that pay for research-grade devices in care, researchers and companies could prioritize publishing and disseminating data validating that lower-cost, consumer devices accurately measure fine-grained behaviors associated with functioning. In addition, VBC programs should be careful on mandating device-driven data collection, which may be difficult for specific populations (eg, homeless) that cannot easily access devices for care.

Furthermore, future research could focus on evaluating how passive and active data change as individuals engage in care. HCI and digital mental health researchers studying active and passive mental health measures often focus on non-clinical populations, for example, students [104, 105, 137] or information workers [33, 106]. In addition, administrative claims, which track prescriptions and treatments billed to payers, could also track engagement in care, though researchers need to validate that billed claims measure actual engagement (eg, a prescription could be billed even though a patient does not take their medication) [83, 149]. Thus, future work designing HITs supporting VBC could collect a suite of clinical, active, and passive data using EHRs, claims databases, research-grade and consumer devices, and quantify expectations for how this data changes for different types of patients as they engage in specific clinical interventions.

Developing Risk-Adjustment Methods.

5.1.3

Finally, our findings in Section 4.3.2 describe that participating clinicians wanted HITs to risk-adjust outcomes data used in VBC. Otherwise, participants were concerned that providers could “game” outcomes by prioritizing simpler cases. These concerns are valid: a working paper from the National Bureau of Economic Research suggests that pay for performance programs encourage providers to not treat high risk dialysis patients [15]. Though risk-adjustment could reduce gaming, other scholars argue that risk-adjustment could increase inequities in care by normalizing inferior treatment outcomes for more difficult cases [66]. We see opportunities for HCI research to work closely with patients, providers, and health economists to develop risk-adjustment methods that do not increase inequities. For example, metrics measuring the fairness of machine learning and AI models, such as equal opportunity, odds, or demographic parity [55, 70, 79] could inform risk-adjustment models by quantifying differences in expected treatment outcomes across sensitive groups (eg, race, gender). In addition, expected treatment outcomes would need to be developed for different outcome data “bundles”, resulting in risk-adjustment paradigms for the different passive and active metrics that may be used in care.

Finally, investing in measurement-based care training and ongoing consultation programs for clinicians [91], or developing methods to triangulate rating scales with, for example, passive behavioral data, may result in VBC metrics that are more robust to variable clinician rating practices. These triangulation methods could help resolve discrepancies between how patients or clinicians rate symptoms in care and observable behavior. For example, HCI researchers have imagined how passive measures could act as digital collateral – reifying clinician or patient ratings to get a more complete picture of treatment progression [40, 43]. Accumulating, de-identifying, and sharing outcomes data across clinics for research could enable triangulation methods to create more robust quality metrics across data types and rating practices [3].

Taking a Stakeholder-Centered Perspective to Design HITs Supporting Health Systems

5.2

Accounting for Health System-Level Design Challenges.

5.2.1

In this work, we were confronted with health system-level design challenges for HITs. First, our findings in Section 4.2.1 describe challenges participants encountered funding HITs that support mental health data collection, which could be solved if providers, health system administrators, and payers were incentivized to invest in and create sustainable funding streams for these HITs. Mental healthcare in the United States remains underfunded compared to physical healthcare, despite parity laws, and many mental health specialty care providers do not take health insurance [17, 71, 116]. In addition, mental health has lagged behind other specialties in implementing HITs [75]. Technology adoption incentives have often excluded mental health providers. For example, the HITECH Act in the United States offered financial incentives to providers that implemented EHRs, but excluded nonphysician providers, including the clinical psychologists and social workers that make up a large part of the mental health workforce [7, 23, 93, 117]. In addition, EHR implementations are expensive, inhibiting smaller provider practices from adopting EHRs [118]. This may partially explain why many participants in our study – many of whom were clinical social workers and psychologists working in small practices – did not use HITs, or why the HITs they used did not contain standardized fields to store mental health outcomes data.

Second, participants described that VBC programs should enforce joint accountability (Section 4.3.1), where payers, providers, and social services coordinate care and are all held financially accountable to care outcomes. Current approaches towards joint accountability are not straightforward. In the United States, different organizations rate the quality of care for different stakeholders involved in health service delivery. For example, the NCQA publishes quality data used to accredit specific health insurance plans [103], while CMS publishes quality data on healthcare providers [27]. Though health plans and providers are rated independently, their ratings are intertwined: health plan ratings are affected by the outcomes of service providers, and providers’ outcomes are limited by services health plans cover. Government social services may also be impacted by low quality care. Poor social services, leading to a lack of, for example, housing and employment opportunities, can worsen health [10]. This leads to higher acute healthcare utilization, particularly among individuals receiving government health insurance – which in the U.S. are primarily elderly, lower income, or individuals on disability [74] – increasing public healthcare expenditures.

More direct approaches to implementing joint accountability would reward or penalize all providers, payers, and social services as population health improves or worsens. This would force these stakeholders to coordinate care to improve health outcomes. One example of more direct joint accountability was the Hennepin Health Accountable Care Organization (ACO), a health insurance plan that coordinated and shared healthcare cost savings across different organizations providing social and mental health services [135]. Enrollees in the ACO had more consistent primary and mental healthcare utilization and improved quality of life [135], while also demonstrating some, though non-significant, cost savings [136]. Yet, health economists have warned that forcing health systems to assume responsibility for social services could disproportionately burden low-resourced health systems [50]. But, if risk was shared across health systems, higher-resourced health systems would be incentivized to invest in communities served by lower-resourced health systems, alleviating some of this burden.

Designing for Health Systems in HCI.

5.2.2

We see these challenges as opportunities for HCI research that critically engages with how funding and health system-level data shape the design of HITs. HCI scholars have advocated that HCI focus on aspects of financing and stakeholder coordination that influence the effectiveness of HITs [18, 28]. For example, the challenges we have identified could be framed as a goal misalignment challenge [77, 123], where the data collection goals for VBC program administrators – monitoring health system-level outcomes and costs – are not always aligned with the goals of patients and clinicians, to monitor individual-level progress in care. Personal informatics researchers have shown that individuals are more likely to engage in data collection when it aligns with their specific goals [39], and designing technologies to account for the diversity of users’ goals can improve the data collection experience [124]. If patients and clinicians can personalize or negotiate collected outcomes data with VBC program administrators, we may be able to better align system- and individual-level data collection aims [53].

To align goals, Forlizzi argues that HCI take a service-oriented approach to design, which they call stakeholder-centered design, to account for the needs of stakeholders and their interactions with technologies [45]. In this work, we centered one specific stakeholder: mental health clinicians. Future work taking a stakeholder-centered design perspective could uncover how interactions between patients, payers, providers, and social service organizations contribute to the financing and effective use of HITs in VBC. Methods from service design could center each stakeholder’s design requirements for HITs, and how HITs can support interactions within joint accountability programs [46, 62]. For example, stakeholder mapping could map the power dynamics of stakeholders involved in financing HITs [107]. Service blueprinting could then identify interactions with HITs across stakeholders in joint accountability programs. Surfacing multi-stakeholder perspectives are essential towards understanding why a specific HIT may or may not be funded or adopted. Such methods would allow technology designers to infuse stakeholders’ perspectives into the initial development of HITs, build technologies that effectively support VBC, and improve care outcomes for all patients.

Limitations

5.3

These findings reflect our interpretation of the literature joined with the perspectives of 30 mental health clinicians. They should not be interpreted to reflect mental health clinicians as a whole. In addition, we only interviewed mental health clinicians as participants. A broader perspective of value-based mental healthcare would include the views of other stakeholders, including but not limited to patients, health system administrators, payers, and social service administrators. We plan to include these perspectives in future work. Participants and the authors were based in the United States. Findings and implications are thus biased towards a U.S. perspective. In addition, the majority of participants worked in academic medical centers, which are not representative medical centers and clinics across the United States [44]. We often asked participants during the interviews about their current payment arrangements (eg, insurance, out-of-pocket pay), but structured data to support our findings were not collected.

Conclusion

5.4

Our findings illuminate opportunities to design health information technologies that support value-based mental healthcare. Specifically, our findings advocate for flexibility in HIT development, allowing healthcare providers and patients choice to collect and store outcomes data for VBC most aligned with their specific care goals. Simultaneously, future HCI research could take a more health systems-level perspective towards HIT design, by engaging the multiple stakeholders involved in VBC who influence the effectiveness of these technologies. We hope these findings chart a path towards developing technologies that effectively support healthcare payment programs, clinicians’ practice, improve health service delivery, and patient outcomes.

Bibliography150

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Adler Daniel A., Stamatis Caitlin A., Meyerhoff Jonah, Mohr David C., Wang Fei, Aranovich Gabriel J., Sen Srijan, and Choudhury Tanzeem. 2024. Measuring algorithmic bias to analyze the reliability of AI tools that predict depression risk using smartphone sensed-behavioral data. npj Mental Health Research 3, 1 (April 2024), 1–11. doi:10.1038/s 44184-024-00057-y Publisher: Nature Publishing Group.38649446 PMC 11035598 · doi ↗ · pubmed ↗
2Adler Daniel A., Tseng Vincent W.-S., Qi Gengmo, Scarpa Joseph, Sen Srijan, and Choudhury Tanzeem. 2021. Identifying Mobile Sensing Indicators of Stress-Resilience. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (June 2021), 51:1–51:32. doi:10.1145/346352835445162 PMC 9017954 · doi ↗ · pubmed ↗
3Adler Daniel A., Wang Fei, Mohr David C., Estrin Deborah, Livesey Cecilia, and Choudhury Tanzeem. 2022. A call for open data to develop mental health digital biomarkers. BJ Psych Open 8, 2 (March 2022). doi:10.1192/bjo.2022.28 Publisher: Cambridge University Press.PMC 893594035236540 · doi ↗ · pubmed ↗
4Adler Daniel A., Yang Yuewen, Viranda Thalia, Xu Xuhai, Mohr David C., Van Meter Anna R., Tartaglia Julia C., Jacobson Nicholas C., Wang Fei, Estrin Deborah, and Choudhury Tanzeem. 2024. Beyond Detection: Towards Actionable Sensing Research in Clinical Mental Healthcare. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 4 (Nov. 2024), 160:1–160:33. doi:10.1145/369975539639863 PMC 11620792 · doi ↗ · pubmed ↗
5Agency for Healthcare Research and Quality. 2015. Types of Health Care Quality Measures. https://www.ahrq.gov/talkingquality/measures/types.html 10.1080/1536028080253733221923316 · doi ↗ · pubmed ↗
6American Psychiatric Association. 2022. The Psychiatric Bed Crisis in the US: Understanding the Problem and Moving Toward Solutions. Technical Report. American Psychiatric Association. https://www.psychiatry.org/getmedia/81f 685f 1-036e-4311-8dfc-e 13ac 425380 f/APA-Psychiatric-Bed-Crisis-Report-Full.pdf
7American Psychological Association. 2012. The HITECH Act and eligible professionals: FA Qs for psychologists. https://www.apaservices.org/practice/update/2012/07-30/hitech-act
8Andersen John, Larsen Jens Knud, Kørner Alex, Nielsen Bjarne Mejer, Vilhelm Schultz, Behnke Kirsten, and Bjørum Niels. 1986. The Brief Psychiatric Rating Scale: Schizophrenia, Reliability and Validity Studies. Nordisk Psykiatrisk Tidsskrift 40, 2 (Jan. 1986), 135–138. doi:10.3109/08039488609096456 Publisher: Taylor & Francis _eprint: https://doi.org/10.3109/08039488609096456. · doi ↗