The evaluation of clinical outcomes assessments and digital health technologies in clinical trials for obesity

Iris A. Goetz; Carolyn Sutter; Traci Abraham; Chisom Kanu; Kristina S. Boye; Tara Symonds

PMC · DOI:10.1186/s41687-025-00841-0·February 20, 2025

The evaluation of clinical outcomes assessments and digital health technologies in clinical trials for obesity

Iris A. Goetz, Carolyn Sutter, Traci Abraham, Chisom Kanu, Kristina S. Boye, Tara Symonds

PDF

Open Access

TL;DR

This paper reviews how clinical trials for obesity use patient-reported outcomes and digital health tools to assess the impact of weight loss on quality of life and physical function.

Contribution

The study provides a comprehensive analysis of COAs and DHTs used in obesity trials, highlighting gaps in DHT adoption and common PRO measures.

Findings

01

PRO measures like the Short Form 36 were most commonly used in obesity trials.

02

Physical function was the most frequently assessed domain in secondary endpoints.

03

Digital health technologies were rarely used compared to traditional COAs.

Abstract

Clinical trials for obesity have traditionally focused on weight loss and resolution of comorbidities as primary outcomes. However, secondary outcomes, such as the impact of weight reduction on patient experience, like health-related quality of life (HRQoL), have increasingly been recognized as important. Therefore, a review was conducted to determine the Clinical Outcome Assessments (COAs) and Digital Health Technologies (DHTs) used in clinical trials for obesity to assess the patient experience. Two clinical trial databases (United States & European Union) were reviewed to identify Phase 2–4 clinical trials for obesity (2018–2023). A targeted literature review was also conducted using the OVID database to identify clinical trial for obesity publications which included COAs/DHTs (2010–2023). Trials from the databases (n = 53) and publications (n = 42) were included in data extraction…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases3

obesity weight loss disordered eating

Funding1

—http://dx.doi.org/10.13039/100004312Eli Lilly and Company

Keywords

Chronic weight managementClinical trialsObesityClinical outcome assessmentsPatient reported outcomesPerformance outcomeDigital health technologiesHealth-related quality of lifePhysical function

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEating Disorders and Behaviors · Mobile Health and mHealth Applications · Cardiac Health and Mental Health

Full text

Background

Obesity continues to represent a significant health challenge for populations across the globe. Since the 1980s, the prevalence of overweight and obesity has doubled globally. Currently, about one-third of individuals in the world could be categorized as overweight or obese [1].

Scientific evidence demonstrates strong associations between obesity and both morbidity and mortality, specifically with increased levels of particular cancers, hypertension, stroke, diabetes mellitus, and disability [2, 3]. As such, clinical trials for obesity have traditionally been concerned with weight loss and resolution of comorbidities as primary outcomes [4]. However, over the last 10 years, drug development programs have increasingly incorporated outcomes reported as most important by patients, including the effect of weight loss on health-related quality of life (HRQoL) and well-being. Additionally, patient satisfaction with treatment has become a point of importance [5]. Consequently, over the last several years, clinical outcome assessments (COAs) and other patient focused measures have been integrated into obesity trials to better measure patient satisfaction and experience, such as in trials for the drugs liraglutide and semaglutide [6, 7].

A COA is defined by the United States Food and Drug Administration (FDA) as any measure that describes or reflects how a patient feels, functions, or survives [8]. There are 4 general types of COA measures: Patient-reported outcomes (PROs), Observer-reported outcomes (ObsROs), Clinician-reported outcomes (ClinROs), and Performance outcomes (PerfOs) [8]. A PRO measure is defined as any report on the condition of a patient’s health that is obtained directly from the patient and does not involve clinician or outside interpretation of the patient’s response [8]. PRO measures attempt to capture the treatment experience from the patient’s point of view. An ObsRO is an evaluation of observable signs and behaviors pertaining to a patient’s health condition by individuals who are commonly around the patient (family, caregivers, etc.) [8]. ClinRO measures are reports that are obtained directly from a trained clinician and convey their interpretation of events, signs, and behaviors related to the patient’s condition. A PerfO is an assessment obtained by asking a patient to complete an established standardized task, such as reading an eye chart [8].

Lastly, digital health technologies (DHTs) can also be used to administer COAs and are generally defined as systems that use computing platforms, connectivity, software and/or sensors to capture patient-focused data [9]. DHTs can collect a wealth of information, including about how a patient is functioning, and therefore could be considered another type of COA, although not formally established by regulatory authorities as such. One example of how a DHT can be used to construct an endpoint is using a wearable fitness trackers to track steps taken per day as a measure of physical fitness. According to one recent scoping review, DHTs were most commonly used to collect physiological data (37.1%), clinical symptoms data (36.9%), and behavioral data (33.5%) [10].

Despite the increased use of COAs in clinical trials, there is no published overview in the literature describing their use to capture the patient perspective in clinical trials for obesity. This becomes particularly important as the patient’s perspective continues to be under assessed [11]. As such, the aim of this targeted literature review was to identify COAs and DHTs used in Phase 2–4 clinical trials for obesity during the 2018–2023 period to provide an overview of how the patient experience is being evaluated and their implementation to construct endpoints.

Methods

Searches were performed using a 2-step process. The number of years and publications included in data extraction were limited to approximately N = 50 trials to stay within the scope of a targeted literature review.

Step 1 entailed a search of the 2 main United States (US) and European Union (EU) clinical trial databases (clinicaltrials.gov and clinicaltrialsregister.eu) to identify any COAs or DHTs used in obesity trials from June 2018 to June 2023. Searches were conducted using the combined key words: chronic weight management OR obesity OR weight loss OR overweight. Additionally, the parameters of adult only, drug treatment (for US searches only), and Phase 2–4 clinical trials were specified. Trials identified from the search were then screened for inclusion if they mentioned use of COAs or DHTs, with US trials screened first and only unique trials from the EU search included (with duplicates with the US results excluded).

Step 2 consisted of a targeted review of published literature between 2010 and 2023 which described clinical trials for obesity and included COAs/DHTs. The search of published literature was performed using the OVID (EMBASE, Medline, and PsycINFO) database, and search terms were developed based on initial searches to facilitate the identification of the most relevant articles. Screening criteria for results of the OVID search was a 2-part process; first, a broad screen based on item titles and/or abstracts was applied, and then all shortlisted titles or abstracts from this broad screen were reviewed for final eligibility.

Similarly, a search was conducted of oral and poster abstracts from the conference proceedings of relevant organizations, including Obesity Week, the American Diabetes Association, the European Association for the Study of Diabetes, the International Society for Pharmacoeconomic and Outcomes Research, and the International Society for Quality of Life Research. This search allowed capture of relevant material which may have been presented by poster or oral presentation but was not yet published in the literature. To stay within the scope of the limited review (approximately N = 50 clinical trials), the abstracts search was limited to the past 3 years (January 2021 - June 2023).

Upon completion of the respective reviews, data was extracted and synthesized to effectively present information related to COAs and DHTs currently being used in clinical trials for obesity.

Results

After screening, a total of 53 unique registered clinical trial entries (n = 48 clinicaltrials.gov, n = 5 clinicaltrialsregister.eu) were taken forward for data extraction. Trials included were in Phase 2 (6; 11%), Phase 2–3 (8; 15%), Phase 3 (26; 49%), and Phase 4 (13; 25%). From the 53 trials, 108 different COAs were identified, including 83 PRO measures (86%), 24 PerfO measures (22%), and 1 composite PRO-ClinRO measure (1%). Additionally, 2 DHTs were identified that were used to capture data for a performance outcome related to physical activity. Most trials specified at least 1 COA endpoint (n = 50 trials; 94%). Some trials designated results of COAs as more than 1 type of endpoint (e.g., both a primary and secondary endpoint for different phases of the same trial). The clinical trials investigated 33 drug treatments for obesity (see supplementary information for a full list of drug treatments included in clinical trials).

There were 33 publications identified from the literature search and 9 additional conference abstracts; a total of 42 publications were included for data extraction. After screening for duplicate trials, an additional 20 trials and 13 PRO measures were identified. Most sources were published from 2016 to 2023, with the majority (n = 28, 67%) published from 2021 onward. The published literature presented clinical trials exploring a variety of drug treatments similar to those seen in the database review.

Across the data extracted from the clinical trial entries and published literature, a total of 108 unique COAs, as well as 2 DHTs, were identified as being used to measure outcomes in obesity trials. There was a total of 73 clinical trials (N = 73). The majority of the COAs were PRO measures (n = 83; 77%), although many PerfOs (n = 24; 25%) were also identified, as well as 1 composite PRO-ClinRO measure (n = 1; 1%). The COAs were used to construct co-primary or key secondary endpoints in 25 trials (n = 34%), secondary or supportive secondary endpoints in 63 trials (86%), and exploratory endpoints in 8 trials (11%). The measures were organized into the following categories: HRQoL, mental health-related, disordered eating-related, eating-related thoughts and behaviors, physical activity, sleep-related, cognition-related, symptoms and impacts related to osteoarthritis, impacts on work, and “other” measures.

Health related quality of life measures

Measures of HRQoL were most frequently used to derive endpoints in the 73 trials (n = 45/73; 62%); see Table 1. All of the identified measures assessing HRQoL (n = 7) were PRO measures.

The SF-36 (Short-Form 36 items) was the most frequently identified measure and was used to derive an endpoint for 32 of the 45 trials. The SF-36 is comprised of 8 domains (mental health, role emotional, social functioning, vitality, role-physical, PF, bodily pain, and general health) which contribute to 2 summary component scores of wellbeing (physical health and mental health) [12]. The SF-36 was mainly used to construct a secondary endpoint considering change from baseline to end of trial. Less often, it was used to form a confirmatory or supportive secondary endpoint considering percentage of patients who achieved a predefined meaningful within-person improvement, or to construct an exploratory endpoint. The endpoint position is specified in Table 1, below.

The IWQoL-Lite-CT was the second most widely used HRQoL measure and was used to construct a secondary endpoint for 21 trials. The IWQoL-Lite-CT is an alternative version of the IWQoL-Lite developed specifically for use in obesity clinical trials to assess psychosocial and physical functioning of patients. The measure has 20 items and 2 main domains (physical and psychosocial) [13, 14]. Seven of the 21 trials used only the IWQoL-Lite-CT physical function composite score as a confirmatory or secondary endpoint; the trials used either change in the composite score from baseline to end of trial or used the percentage of participants that experienced meaningful improvement for these endpoints.

Measures less frequently reported were the EQ-5D-5 L [15], which was used to form a secondary endpoint measuring mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Additionally, the Mean QOL questionnaire and World Health Organization Quality of Life - Brief Version [16] were used to construct secondary endpoints, with the World Health Organization Quality of Life- Brief Version measuring physical health, psychological health, social relationships, and environment. Finally, the Treatment-Related Impact Measure-Weight [17] was used to derive a supportive secondary endpoint measuring HRQoL related to weight loss. Additional details about these measures are provided in Table 1, below.

Table 1COAs identified in review: health-related quality of lifeMeasureType of COA# of Trials(n = 45)ConceptEndpoint PositionFrequency per Measure Across TrialsSF-36^a^PRO32HRQoL(physical function, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, mental health)Secondary20Confirmatory secondary7Supportive secondary2Key secondary efficacy1Exploratory2IWQOL-Lite-CTPRO21HRQoL(physical function, self-esteem, sexual life, public distress, work)Secondary16Confirmatory secondary5EQ-5D-5 LPRO9QoL(mobility, self-care, usual activities, pain/discomfort and anxiety/depression)Secondary5Not provided4IWQOL-LitePRO7HRQoL(physical function, self-esteem, sexual life, public distress, work)Secondary2Supportive secondary1Exploratory4TRIM-WeightPRO1QoL related to weight lossSupportive secondary1Mean QOL questionnairePRO1QoLSecondary1WHOQOL-BREFPRO1QoL(physical health, psychological health, social relationships, environment)Secondary1EQ-5D-5 L EQ-5D 5 Level; COA = clinical outcome assessment; QoL = quality of life; IWQoL-Lite = Impact of Weight on Quality of Life-Lite; IWQoL-Lite-CT = Impact of Weight on Quality of Life-Lite Clinical Trials; PRO = patient-reported outcome; SF-36 = Short-Form 36 items; TRIM-Weight = Treatment-Related Impact Measure-Weight; WHOQOL-BREF = World Health Organization Quality of Life-BREF^a^As it is possible that sponsors did not distinguish between use of the SF-36 and SF-36 v2, we have reported all use of these measures as “SF-36” in the tableNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information are listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Mental health-related measures

A total of 8 mental health related measures were identified across 21 trials (n = 21/73; 29%) (See Table 2). Most of the mental health-related measures(n = 7/8; 88%) were PRO measures; additionally, 1 composite PRO-ClinRO measure (the Columbia Suicide Severity Rating Scale) was identified.

The Patient Health Questionnaire 9 items (PHQ-9) [18] was used in 14 of the 21 trials as a measure of depression, most frequently to derive a secondary endpoint, with some trials specifically indicating its use to construct a supportive secondary endpoint, whilst others used it to construct a safety outcome. The PHQ-9 is a PRO measure that aims to assess severity of depression in patients. This questionnaire contains 9 items corresponding to the 9 criteria on which the DSM-IV-based diagnosis of depression and depression-related disorders. These items include interest/pleasure in doing things, feeling depressed or hopeless, sleep difficulty, energy level, appetite, self-image, concentration ability, moving or speaking slowly, and suicidal thoughts [19].

Less frequently used measures of depression for deriving secondary endpoints were the Columbia Suicide Severity Scale, Beck Depression Inventory second edition, Patient Reported Outcomes Measurement Information System Depression and Anxiety scales, the Generalized Anxiety Disorder 7, and the Self-reporting Questionnaire 20-item. The Perceived Stress Scale [20] was also identified but the endpoint positioning was not described. Additional details about these measures are described in Table 2, below.

Table 2COAs identified in review: mental health-relatedMeasureTypeof COA# of Trials(n = 21)ConceptEndpoint PositionFrequency of Measure Across TrialsPHQ-9PRO14DepressionSecondary5Supportive secondary4Other secondary1Exploratory1Safety outcome2Not provided1C-SSRSPRO & ClinRO6DepressionSecondary4Safety outcome3BDI-IIPRO4DepressionSecondary2Screening1Not described1PSSPRO3StressExploratory predictor variable1Not described2PROMIS Anxiety (Short Form v1.08a)PRO1AnxietySecondary1PROMIS Depression (Short Form v1.08a)PRO1DepressionSecondary1GAD-7PRO1AnxietyExploratory predictor variable1SRQ-20PRO1DistressSecondaryBDI-II = Beck Depression Inventory 2nd Edition; COA = clinical outcome assessment; C-SSRS = Columbia Suicide Severity Rating Scale; GAD-7 = Generalized Anxiety Disorder 7; PHQ-9 = Patient Health Questionnaire-9; ND = Not disclosed; PRO = patient-reported outcome; PROMIS = Patient Reported Outcomes Measurement Information System; PSS = Perceived Stress Scale; SRQ-20 = Self-Reporting Questionnaire 20-itemNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Eating-related thoughts and behaviors measures

A total of 25 measures of eating-related thoughts and behaviors were used in 18 trials (n = 18/73; 25%) (see Table 3). Most of these measures (n = 19/25; 76%) assessing eating-related thoughts and behaviors were PRO measures, although some PerfO measures (n = 6/25; 24%) were also used.

Table 3COAs identified in review: eating-related thoughts and behaviorsMeasureType of COA# of Trials(n = 18)ConceptEndpoint PositionFrequency of MeasureAcross TrialsVASPRO11Appetite sensations/PalatabilityCo-primary2Secondary10Exploratory1Not described1CoEQPRO4Food cravings (craving control, craving for savory, craving for sweet, positive mood)Secondary1Supportive secondary1Exploratory2PFSPRO3Responsiveness to food environmentSecondary1Exploratory predictor variable1Not described1EIPRO3Eating behavior (cognitive restraint, dietary disinhibition)Supportive secondary1Exploratory predictor variable1Not described1LFPQPerfO3Food preferenceSecondary2Supportive secondary1FFCSPRO2Food cravingsSecondary2Chocolate Milkshake Drinking TaskPRO2Hedonic food intakeCo-primary2FCQ-TPRO1Food cravingsSupportive secondary1FCQ-T-ReducedPRO1Food cravingsExploratory predictor variable1PFS-15 itemPRO1Responsiveness to food environmentSecondary1FCIPRO1Food cravingsSecondary1ASA24PRO1Diet qualityCo-primary1DFSPRO1Diet qualityCo-primary13-Day Food DiaryPRO1Diet qualityCo-primary & Secondary1GLMSPRO1Food intensity perceptionCo-primary1LHSPRO1Food preferenceCo-primary1BHEPRO1Barriers (lack of knowledge, self-control, time)Exploratory predictor variable1RED-13PRO1Food reinforcementCo-primary1DEBQPRO1Eating behavior (emotional eating, external eating, restraint)Exploratory predictor variable1TFEQPRO1Eating behaviorNot described1FSIPerfO1SatietySecondary1Becker DeGroot Markov Auction Task, modifiedPerfO1Food reinforcementCo-primary1Reinforcing Efficacy of High- and Low-calorie FoodPerfO1Food reinforcementExploratory predictor variable1RRV-FPerfO1Motivation to eatSecondary14-Meter Fast Paced Walk TestPerfO1Motivation to eatSecondary1ASA24 = Automated Self-Administered 24-Hour Recall; BHE = Barriers to Healthy Eating and Physical Activity; COA = clinical outcome assessment; CoEQ = Control of Eating Questionnaire; DEBQ = Dutch Eating Behavior Questionnaire; DFS = Dietary Fat & Sugar Intake Questionnaire; EI = Eating Inventory; FCI = Food Craving Inventory; FCQ-T = General Food Cravings Questionnaire – Trait; FFCS = Favorite Food Craving Scale; FSI = Food Satiety Index; GLMS = General Labeled Magnitude Scale; LFPQ = Leeds Food Preference Questionnaire; LHS = Labeled Hedonic Scale; PerfO = performance outcome; PFS = Power of Food Scale; PRO = patient-reported outcome; RED-13 = Reward-Related Eating Questionnaire; RRV-F = Relative Reinforcing Value of Food; TFEQ = Three Factor Eating Questionnaire; VAS = visual analogue scaleNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Visual analogue scales (VAS) [21] measuring appetite sensations (including hunger, fullness, satiety, prospective food consumption) and palatability were most frequently identified, having been reported for 11 of the 18 trials. Most of the trials (10/11) used a VAS as to construct a secondary endpoint; less frequently, it was also a co-primary and exploratory endpoint. In general, Visual Analogue Scales are being used in clinical trials to measure concepts like pain, thirst, and hunger. To achieve this goal, scores are created from self-reported responses that are indicated by a written mark placed along a 10 cm line, where the left end of the line represents the absence of the concept, and the right end represents the highest amount of severity (worst pain, worst hunger, etc.) [21, 22]. Less frequently identified PRO measures are provided in Table 3, below. These PRO measures were primarily used to form a secondary endpoint but were also used to form a co-primary endpoint, and a supportive secondary endpoint, or to derive an exploratory predictor variable.

PerfO measures included the Reinforcing Efficacy of High- and Low-calorie Food, Leeds Food Preference Questionnaire, Becker DeGroot Markov Auction Task, Reward-Related Eating Questionnaire 13 item, the Relative Reinforcing Value of Food, and the 4-Meter Fast Paced Walk Test. These measures were used to derive co-primary endpoints, secondary endpoints, supportive secondary endpoints, and/or as an exploratory predictor variable. Additional details about each of these measures is provided in Table 3, below.

Physical activity-related measures

A total of 9 measures of physical activity were identified in 16 trials (n = 16/73; 22%) (see Table 4). These measures assessing physical activity included 5 PRO measures (n = 5/9; 56%), 2 PerfO measures (n = 2/9; 22%), and 2 DHTs (n = 2/9; 22%).

The 6-Minute Walk Test (6-MWT) [23], a PerfO measure whereby distance covered in 6-minutes is calculated, was most frequently used among the measures (8/16 trials), having been used in 4 trials to construct a secondary endpoint and in 1 of the 4 trials also to form part of a co-primary endpoint with the Kansas City Cardiomyopathy Questionnaire (KCCQ). When used to form a co-primary endpoint, the 6-MWT was listed third in a hierarchical composite that also included all-cause mortality and heart failure events. The reported unit was the total “wins” for each treatment group obtained from a hierarchical comparison of the components (randomization to study completion). The endpoint position was not described for 4 trials.

The Patient Global Impression of Severity for physical function and the Patient Global Impression of Change for physical function were both equal in frequency of use to the 6-MWT, however, no endpoint positions were provided for either of these measures. The Patient Global Impression of Severity is a single item self-administered measure that aims to evaluate severity of condition from the patient’s perspective, and is commonly used as an outcome measure for various diseases [24]. The Patient Global Impression of Change is another self-administered measure that seeks to assess the perception of change following treatment (improvement versus worsening) from the point of view of the patient [25].

Less frequently identified measures in the trials database and published literature included the Paffenbarger Physical Activity Questionnaire, International Physical Activity Questionnaire, and the Exercise Self Efficacy Scale. These PRO measures were used to derive a primary endpoint, secondary endpoint, exploratory endpoint, and/or as an exploratory predictor variable. The 4-Meter Fast Paced Walk Test, a PerfO measure, was also identified and used to derive a secondary endpoint.

The 2 DHTs included the ActiGraph wGT3X-BTLink accelerometer, which was used to measure physical activity, gait, and balance, and the VitalCare digital health platform application, which was used to measure steps, calories per day, and exercise sessions per week, were used to construct part of a co-primary endpoint or a secondary endpoint. Additional details about each of these measures is described in Table 4, below.

Table 4COAs and DHTs identified in clinical: physical activity-relatedMeasureType of COA# of Trials(n = 16)ConceptEndpoint PositionFrequency of MeasureAcross Trials6MWT / 6MWDPerfO8Physical capacityPrimary, Secondary1Co-primary, Secondary1Secondary1Confirmatory secondary, Supportive secondary1Confirmatory secondary1Not provided4PGIS for physical activity/ functionPRO8Physical activity/functionNot provided8PGIC for physical activity/ functionPRO8Physical activity/functionNot provided8PPAQPRO1Physical activityExploratory1iPAQPRO1Physical activityScreening1ESESPRO1Exercise self-efficacyExploratory predictor variable14-Meter Fast Paced Walk TestPerfO1Physical capacitySecondary1ActiGraph wGT3X-BTLink accelerometerDHT1Physical activity, gait, balanceCo-primary1VitalCare digital health platform (app)DHT1Physical activity (steps, calories per day, exercise sessions per week)Secondary16MWT = 6-minute Walk Test; 6MWD = 6-minute Walk Distance; COA = clinical outcome assessment; DHT = digital health technology; ESES = Exercise Self Efficacy Scale; iPAQ = International Physical Activity Questionnaire; PerfO = performance outcome; PGIC = Patient Global Impression of Change; PGIS = Patient Global Impression of Severity; PPAQ = Paffenbarger Physical Activity Questionnaire; PRO = patient-reported outcomeNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Disordered eating-related measures

A total of 10 other measures of disordered eating (food addiction, impulsivity regarding food, and binge eating) were found for 15 trials (n = 15/73; 21%) (See Table 5). All of these measures assessing disordered eating (n = 10) were PRO measures.

The Eating Disorder Examination Questionnaire (EDE-Q 6.0) [26] was most frequently reported as a measure of disordered eating (11/15 trials) and was typically used to construct a secondary endpoint across the clinical trials (8 out of 11 trials), and in 1 trial to derive a supportive secondary endpoint. The endpoint position for this PRO measure was not provided for 1 trial. The EDE-Q 6.0 is a widely used self-reported measure that assesses behavior and attitudes in eating disorders [27]. This measure is based on the well-established Eating Disorder Examination (EDE), which has commonly been considered the gold standard for measuring eating disorders (ED) [28]. As it is self-reported and relatively simple to administer, the EDE-Q 6.0 offers a valid cost-efficient alternative to the EDE that can be particularly useful when dealing with large populations [29]. The EDE-Q was used to assess change in global score and change in all of the subscales (dietary restraint, eating concert, weight concern, and shape concern).

Less frequently identified PRO measures included the interview version of the EDE; the Yale Food Addiction Scale; the Urgency, Premeditation, Perseverance, Sensation Seeking, and Positive Urgency Impulsive Behavior Scale; the Binge Eating Scale; the Barratt Impulsiveness Scale; the Behavioral Inhibition/Activation Scale; the Eating Disorder Inventory; the Questionnaire on Eating and Weight Patterns; and the Eating Loss of Control Scale. These PRO measures were primarily used to construct secondary endpoints, but also derived co-primary endpoints, supportive secondary endpoints, or were used to form an exploratory predictor variable. Additional details about each of these measures is described in Table 5, below.

Table 5COAs identified in review: disordered eating-relatedMeasureType of COA# of Trials(n = 15)ConceptEndpoint PositionFrequency of MeasureAcross TrialsEDE-Q 6.0PRO11Disordered eatingSecondary8Supportive secondary1Not provided2EDE-IPRO8Disordered eatingCo-primary7Secondary6Not provided1YFASPRO6Food addictionSecondary3Exploratory1Exploratory predictor variable1Not provided1UPPS-PPRO2Impulsive behavior (urgency, deliberation, persistence, sensation seeking)Secondary2BESPRO2Binge eating, including key behavioral (e.g., rapid eating, eating large amounts of food) and affective/ cognitive symptoms (e.g., guilt, feeling out of control or unable to stop eating) that precede or follow a bingeSafety outcome1Not provided1BIS-11PRO1Impulsive behavior (attention, motor, self-control, cognitive complexity, perseverance, cognitive instability, as well as attentional, motor, non-planning impulsiveness)Exploratory predictor variable1BIS/BASPRO1Behavioral inhibition (reward responsiveness, drive, fun seeking)Exploratory predictor variable1EDIPRO1Disordered eating (drive for thinness, bulimia, body dissatisfaction, ineffectiveness, perfectionism, interpersonal distrust, interoceptive awareness, maturity fears)Supportive secondary1QEWP-5PRO1Binge eatingExploratory predictor variable1LOCESPRO1Behavioral, cognitive/dissociative, and positive/euphoric aspects of loss-of-control eatingSupportive secondary1Binge Eating Scale (BES); BIS-11 = Barratt Impulsiveness Scale; BIS/BAS = Behavioral Inhibition/Activation Scale; COA = clinical outcome assessment; EDE-I = Eating Disorder Examination Interview; EDE-Q 6.0 = Eating Disorder Examination Questionnaire; EDI = Eating Disorder Inventory; LOCES = Loss of Control Eating Scale; PRO = patient-reported outcome; QEWP-5 = The Questionnaire on Eating and Weight Patterns; UPPS-P = Urgency, Premeditation, Perseverance, Sensation Seeking, and Positive Urgency Impulsive Behavior Scale; YFAS = Yale Food Addiction ScaleNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Measures of impacts on work

A total of 4 measures of impacts on work were identified in 6 trials (n = 6/73; 8%) (see Table 6). All 4 of the measures assessing impacts on work were PRO measures.

The Work Productivity and Activity Impairment Questionnaire-Specific Health Problem v2.0 [30] was most frequently reported (5/6 trials) and was used to construct a secondary endpoint. This questionnaire is a version of the Work Productivity and Activity Impairment questionnaire that aims to measure the impact of disease on work productivity and activity in the context of a specific health problem [30, 31]. The self-administered instrument consists of 6 questions that evaluate employment status, missed work time due to condition, total amount of time worked, and feelings about the condition’s effect on productivity and ability in work and outside of work over the last seven days [31].

Less frequently described PROs included the Stanford Presenteeism Scale and the Work Limitations Questionnaire 8-item and 25-item. The Work Limitations Questionnaire 8-item and 25-item were used to derive secondary endpoints. Additional details about each of these measures is described in Table 6, below.

Table 6COAs identified in review: measures of impacts on workMeasureType of COA# of Trials(n = 6)ConceptEndpoint PositionFrequency of MeasureAcross TrialsWPAI: SHPPRO5Impact of weight on work productivity (absenteeism, presenteeism, work productivity loss, activity impairment)Secondary4Not provided1SP-6PRO4Health status and employee productivityNot provided4WLQ-25PRO1Impact on work (time management, physical demands, mental-interpersonal demands, output)Secondary1WLQ-8PRO1Impact of weight on work productivity (time management, physical tasks, mental or interpersonal tasks, and output tasks along with an index of overall at-work productivity loss)Secondary1COA = clinical outcome assessment; PRO = patient-reported outcome; SP-6 = Stanford Presenteeism Scale; WLQ-8 = Work Limitations Questionnaire- 8 item version; WLQ-25 = Work Limitations Questionnaire- 25 item version; WPAI: SHP = Work Productivity and Activity Impairment Questionnaire Specific Health Problem V2.0Note: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Sleep-related measures

A total of 5 sleep-related measures were identified in 5 trials (n = 5/73; 7%) (Table 7). All 5 of the measures used in these trials were PRO measures.

The Epworth Sleepiness Scale (ESS) [32] and Functional Outcomes of Sleep Questionnaire (FOSQ) [33] PRO measures were most frequently reported, having both been identified in 2 of the 5 trials. The ESS is a self-administered measure that evaluates sleep disorders by asking patients to rate sleepiness from 0 to 3 (0 indicating a low chance of falling asleep) in 8 common situations [32, 34]. Each question is then scored and summed up to create a total score which indicates higher rates of sleepiness during the day [32]. The FOSQ also measures the impact of sleepiness on daily life, and is considered the gold standard among similar measures [35]. This self-administered measure has 30 items that evaluate the impact of excessive daytime sleepiness on patients’ physical, mental, and social functioning [33]. One trial included the ESS, FOSQ, and FOSQ-10 as secondary endpoints. The ESS was used to measure the percent of participants with ESS ≤ 10 as part of a composite, and the FOSQ to measure change in a hierarchical composite score including the FOSQ-10 item subset score as well as the FOSQ vigilance and activity level domain scores. The other trial used the ESS and FOSQ to construct a key secondary efficacy endpoints. Additional sleep-related measures included the FOSQ-10, the Satisfaction, Alertness, Timing, Efficiency, and Duration scale, measuring sleep health, the Bergen Insomnia Scale, and the Baseline sleep hours survey (unspecified). These PRO measures were used to derive a secondary endpoint, and/or to form an exploratory predictor variable. Additional details about each of these measures is described in Table 7, below.

Table 7COAs identified in review: sleep-relatedMeasureType of COA# of Trials(n = 5)ConceptEndpoint PositionFrequency of MeasureAcross TrialsFOSQPRO2Outcomes of sleepKey secondary1Secondary1ESSPRO2SleepinessKey secondary1Secondary1RU-SATED scalePRO1Sleep health (regularity, satisfaction, alertness, timing, efficiency, duration)Secondary1FOSQ-10PRO1Outcomes of sleep (activity level, vigilance, intimacy, sexual relationships)Secondary1Bergen Insomnia ScalePRO1SleepNot provided1Baseline sleep hours survey (unspecified)PRO1SleepExploratory predictor variable1COA = clinical outcome assessment; DHT = digital health technology; ESS = Epworth Sleepiness Scale; FOSQ = Functional Outcomes of Sleep Questionnaire; FOSQ-10 = Functional Outcomes of Sleep Questionnaire 10-item; PRO = patient-reported outcome; RU-SATED = Satisfaction, Alertness, Timing, Efficiency, and Duration scaleNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information are listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Cognition-related measures

A total of 17 cognition-related measures were identified in 2 trials (n = 4/73; 5%) (see Table 8), all of which were PerfO measures These 17 PerfO measures were identified in only 1 trial each.

Table 8COAs identified in clinical review: cognition-relatedMeasureType# of Trials(n = 4)ConceptEndpoint PositionFrequency of MeasureAcross TrialsProbabilistic-Feedback Reward Task (unspecified)PerfO1CognitionCo-primary1 Brief Neuropsychological Battery for Obesity (BNBO) Measures Delay Discounting, KirbyPerfO1Cognition(temporal discounting)Co-Primary1Oral Reading Recognition TestPerfO1Cognition(language decoding, reading)Co-Primary1Penn Progressive Matrices TestPerfO1Cognition(fluid intelligence)Co-Primary1Penn Word Memory TestPerfO1Cognition(verbal episodic memory)Co-Primary1Relational TaskPerfO1Cognition(visual relational processing)Co-Primary1Delay DiscountingPerfO1Cognition(reward sensitivity)Secondary predictor variable (Phase 1),Secondary outcome (Phase 2)1Variable Short Penn Line Orientation TestPerfO1Cognition(visuospatial processing)Co-Primary1 Core Neuropsychological Measures for Obesity and Diabetes (NMOB) Measures Digital Symptom SubstitutionPerfO1Cognition(processing speed)Co-Primary1Dimensional Change Card SortingPerfO1Cognition(cognitive flexibility, task-switching)Co-Primary1Go/No-Go TaskPerfO1Cognition(response inhibition)Co-Primary1Matrix Reasoning TaskPerfO1Cognition(general cognitive ability, non-verbal reasoning ability)Co-Primary1Picture Sequence MemoryPerfO1Cognition(learning, memory)Co-Primary1 Cambridge Neuropsychological Test Automated Battery (CANTAB) Measures Delayed Matching to Sample TestPerfO1Cognition(visuospatial memory)Co-Primary1Intra-Extra Dimensional Set Shift TestPerfO1Cognition(rule acquisition and reversal)Co-Primary1Paired Associates Learning TaskPerfO1Cognition(episodic memory & new learning)Co-Primary1Stockings of Cambridge TestPerfO1Cognition(spatial planning)Co-Primary1COA = clinical outcome assessment; PerfO = performance outcome; PRO = patient-reported outcomeNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

The PerfO measures were used to construct co-primary endpoints, exploratory endpoints, and/or exploratory predictor variables. Additional details about each of these measures is described in Table 8, below.

Measures of symptoms and impacts related to Osteoarthritis

A total of 10 measures of symptoms and impacts related to osteoarthritis were identified in 4 trials (n = 4/73; 5%) (see Table 9). All 10 of these were PRO measures.

The Western Ontario and MacMaster Universities Osteoarthritis Index [36] was most frequently reported (4/4 trials). This index is a self-administered instrument primarily used to measure physical function in patients with osteoarthritis of the knee and hip. The instrument has 24 questions which cover the areas of pain (5 questions), stiffness (2 questions), and physical function (17 questions). In 2 of the trials, this measure was used to derive a co-primary endpoint related to physical function and stiffness, while in the other trial it was used to construct a confirmatory secondary endpoint measuring pain, stiffness, and physical function.

Less frequently reported PRO measures included the Patient Reported Outcomes Measurement Information System - Pain, Arthritis Self-Efficacy Scale, Numeric Rating Scale, Knee Injury and Osteoarthritis Outcome Score, Intermittent and Constant Osteoarthritis Pain, the VAS knee pain, VAS physical function impacts due to knee pain, and VAS impact on daily life of knee pain, and the Pain Catastrophizing Scale. These measures were used to derive a co-primary endpoint and a secondary endpoint. Additional details about each of these measures is described in Table 9, below.

Table 9COAs identified in review: symptoms and impacts related to osteoarthritisMeasureType of COA# of Trials(n = 4)ConceptEndpoint PositionFrequency of MeasureAcross TrialsWOMACPRO4Physical function, stiffnessCo-Primary3Confirmatory secondary1ICOAPPRO1Pain from osteoarthritisConfirmatory secondary1KOOSPRO1Symptoms of knee osteoarthritisCo-Primary1ASESPRO1Arthritis self-efficacy (pain, function, other symptoms)Secondary1Pain NRSPRO1PainSecondary1PCSPRO1Pain catastrophizing (rumination, magnification, helplessness)Secondary1PROMIS Pain (Short Form v1.08a)PRO1Impact of pain on daily lifeCo-Primary1VAS PainPRO1Pain due to knee painNot described1VAS Physical FunctionPRO1Physical function due to knee painNot described1VAS Impact on Daily LifePRO1Impact of knee pain on daily lifeNot described1ASES = Arthritis Self-Efficacy Scale; COA = clinical outcome assessment; ICOAP = Intermittent and Constant Osteoarthritis Pain; KOOS = Knee Injury and Osteoarthritis Outcome Score, Intermittent and Constant Osteoarthritis Pain; NRS = Numeric Rating Scale; PRO = patient-reported outcome; PCS = Pain Catastrophizing Scale; PROMIS = Patient Reported Outcomes Measurement Information System; WOMAC = Western Ontario and MacMaster Universities Osteoarthritis Index

Other measures

A total of 14 other measures covering a variety of conceptual domains were identified in 14 trials (n = 12/73; 16%) (see Table 10). Most of these (13/14) were PRO measures, with 1 PerfO measure (the Acetaminophen test) also identified.

Among these various types of measures, the KCCQ was frequently used (4/12 trials) to form both a secondary endpoint and a co-primary endpoint (based on the clinical summary score). The KCCQ is a disease specific instrument that primarily measures HRQoL in patients with congestive heart failure. The instrument contains 23 items that measure the areas of physical limitation, self-efficacy, social interference, quality of life, and symptoms [37].

The Weight Related Signs and Symptom Measure and International Consultation on Incontinence Questionnaire-Urinary Incontinence-Short Form were also used in 4 trials, although their endpoint positioning was not described. Less frequently reported PRO measures are summarized in Table 10 below. These measures were mostly used to construct secondary endpoints or exploratory predictor variables, in the case of the Weight Bias Internalization scale, Body Satisfaction Scale, Philadelphia Mindfulness Scale, Social Support Scale, Weight Efficacy Lifestyle Questionnaire and Weight Efficacy Lifestyle Questionnaire Short Form. In contrast, the Monell Forced Choice Test was used to derive a co-primary endpoint. Additionally, the 1 PerfO measure, the Acetaminophen test for gastric emptying, was used to derive a co-primary endpoint measuring gastric emptying. Additional details about each of the measures described above is provided Table 10, below.

Table 10COAs identified in review: other measuresMeasureType of COA# of Trials(n = 12)ConceptEndpoint PositionFrequency of MeasureAcross TrialsKCCQPRO4Symptoms and physical limitations associated with heart failure (symptom stability, frequency and burden, physical function, social limitation, self-efficacy, quality of life)Co-primary, Secondary4WRSSMPRO4Weight-related signs and symptomsNot provided4ICIQ-UI-SFPRO4Urinary incontinenceNot provided4WBISPRO2Weight bias internalizationExploratory1Not provided1Monell Forced Choice TestPRO1Food preference (change in sweet/fat concentration)Co-Primary1Body Satisfaction ScalePRO1Body satisfactionExploratory1PHLMSPRO1Mindfulness(present moment awareness, acceptance)Exploratory predictor variable1GSRSPRO1Gastrointestinal symptomsSecondary1WHGQPRO1Hair growthNot provided1Men’s Hair Growth QuestionnairePRO1Hair growthNot provided1Nail health survey (unspecified)PRO1Nail growthNot provided1Acetaminophen testPerfO1Gastric emptyingCo-primary1Social Support ScalePRO1Social support for healthy behaviorExploratory predictor variable1WELPRO1Internal and external influences on self-efficacy related to weightExploratory predictor variable1WEL-SFPRO1Internal and external influences on self-efficacy related to weightSupportive Secondary1COA = clinical outcome assessment; KCCQ = Kansas City Cardiomyopathy Questionnaire; GSRS = Gastrointestinal Symptom Rating Scale questionnaire; ICIQ-UI-SF = International Consultation on Incontinence Questionnaire- Urinary Incontinence Short Form; PerfO = performance outcome; PHLMS = Philadelphia Mindfulness Scale; PRO = patient-reported outcome; WBIS = Weight Bias Internalization Scale; WEL = Weight Efficacy Lifestyle Questionnaire; WEL-SF = Weight Efficacy Lifestyle Questionnaire Short Form; WHGQ = Women’s Hair Growth Questionnaire; WRSSM = Weight-Related Sign and Symptom MeasureNote: Sponsors are not required to include all secondary and exploratory trial endpoints when listing information on US or EU websites, thus measures included in a trial without endpoint information is listed as “not provided” as it is possible the measure was included as a secondary or exploratory endpoint although not reported on the website

Discussion

As previously mentioned, scientific evidence indicates strong associations between obesity and both morbidity and mortality, and individuals with obesity have a higher risk of certain illnesses (particular cancers, diabetes mellitus, disability, stroke, hypertension, as well others) than their non obese counterparts [2, 3]. Additionally, there is evidence that obesity is associated with lower HRQoL, and that even individuals with obesity who could currently be considered “healthy” may already be in transition to a future plagued by poor health [38]. The majority of therapies and treatments for obesity, regardless of approach, are concerned with addressing the abovementioned outcomes, however, determining the most appropriate endpoint for a treatment has remained a point of contention [39]. Often, because of its reliable and quantifiable nature, the reduction of body weight is used as a standard in therapies [40], and a focus is placed on the resolution of comorbidities. Nevertheless, many drug development programs for obesity are beginning to incorporate endpoints that are specifically geared toward measuring concepts considered most important by patients using COAs and DHTs. The purpose of this targeted literature review was to identify these measures in clinical trials for obesity as presented in clinical trial registrations from the past 5 years and related published literature and present a clear picture of which measures are being used to capture the patient experience, and how they are being implemented to construct endpoints within trials.

This targeted review identified a total of 108 COAs and 2 DHTs being used to measure outcomes in clinical trials for obesity. The majority of COAs were PRO measures (n = 83), although some PerfOs (n = 24) were also identified, as well as 1 composite PRO-ClinRO measure. Interestingly with the advent and increased use of DHTs, only 2 were reported in 2 trials.

A variety of concepts were measured using these COAs/DHTs, with measures of HRQoL most frequently included as secondary endpoints. Specifically, the SF-36 and IWQoL-Lite-CT were most consistently used to derive endpoints in the clinical trials. These measures were used to derive either confirmatory or supportive secondary endpoints, with the PF scores were most often used.

Measures of specific aspects of HRQoL, including mental health and physical activity were also included frequently in clinical trials for obesity, with measures of mental health often used as safety outcomes while physical activity measures were most often included to construct secondary endpoints. The PHQ-9 was consistently used as a measure of depression, while the 6-MWT was most frequently used as a measure of physical activity.

Outcomes related to eating-related thoughts and behaviors and disordered eating were also included across many trials. However, the measures used were generally not as consistent across these trials although similar concepts were measured such as disordered eating, binge eating, appetite or palatability, food cravings, food preferences or food reinforcement.

Less frequently, measures related to osteoarthritis and obstructive sleep apnea comorbidities were identified. Other outcomes related to work, cognition, social support, and comorbidities such as heart failure, incontinence, or hair growth were also included in some trials. Measures specific to weight loss treatment, the Treatment-Related Impact Measure-Weight, weight-related symptoms, or self-efficacy related to weight full and short form, were only identified in single trials or publications.

Limitations

A limitation of this targeted review is that not all COA measures used to construct endpoints in the clinical trials were necessarily disclosed on the FDA/European Medicines Agency websites. Only primary and key secondary endpoints used in the endpoint hierarchy to power a study need to be disclosed. Often additional secondary endpoints and exploratory endpoints are not cited. This is borne out by the fact that more COAs were reported in the published literature than were disclosed during registration, suggesting an element of selection, most likely by the sponsor submitting the evidence and likely choosing the most directly impacted concepts in their endpoint hierarchy (i.e., PF) but still collected broader impacts for dissemination in publications. As reviewers at the EMA/FDA could only comment on what they are presented with by the sponsor, it is difficult to determine with certainty how other COAs may have contributed to registration success and market access. Due to the project’s limited scope, this targeted review was also limited to consideration of clinical trial registrations from the past 5 years and abstracts from the past 3 years. However, given the recent growth in obesity trials with inclusion of patient-centered outcomes it is likely most COAs and DHTs being consistently included in current trials were captured.

Conclusion

Review of Clinical Outcome Assessment (COA) measures/Digital Health Technologies (DHTs) in registered clinical trials and publications for obesity found that Patient Reported Outcome (PRO) measures were the most common type of COA used to develop endpoints with current use of DHTs limited. Moreover, multi-dimensional PRO measures assessing HRQoL were most often used. Specifically, the SF-36 and IWQoL-Lite/IWQoL-Lite-CT generic and disease-specific (respectively) measures assessing health-related quality of life (HRQoL) have the most evidence of use in clinical trials for obesity. Most often, these measures, along with other HRQoL PRO measures, have been used in Phase 2–4 (most frequently in Phase 3) clinical trials and to construct secondary endpoints, usually considering outcomes associated with physical function. An interesting next step would be to investigate how COA data is viewed by regulators and payers to understand the importance of such data during regulatory interactions. Additional research is also needed to understand whether the most frequently used measures are considered adequate for assessing outcomes in clinical trials for obesity or whether new measures are required to more adequately assess the concepts of interest, especially with next generation treatments.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1US Food and Drug Administration (2020) Clinical Outcome Assessment (COA): Frequently Asked Questions. https://www.fda.gov/about-fda/clinical-outcome-assessment-coa-frequently-asked-questions. Accessed 13 May 2024
2US Food and Drug Administration (2023) Digital Health Technologies for Remote Data Acquisition in Clinical Investigations: Draft Guidance for Industry, Investigators, and Other Stakeholders. In. United States Food & Drug Administration. https://www.fda.gov/media/155022/download Accessed 13 May 2024
3Kolotkin RL, Williams VSL, von Huth Smith L et al (2021) Confirmatory psychometric evaluations of the 2rsion (IWQOL-Lite-CT). Clinic Obesity Oct;11(5):e 12477. 10.1111/cob.1247710.1111/cob.12477 PMC 928546834296522 · doi ↗ · pubmed ↗
4EUROQOL (2024) EQ-5D-5L. https://euroqol.org/information-and-support/euroqol-instruments/eq-5d-5l/. Accessed 13 May 2024
5World Health Organization (2024) WHOQOL: Measuring Quality of Life. https://www.who.int/tools/whoqol/whoqol-bref. Accessed 13 May 2024
6American Psychological Association (2019) Patient Health Questionnaire-9. https://www.apa.org/depression-guideline/patient-health-questionnaire.pdf. Accessed 13 May 2024
7Delgado DA, Lambert BS, Boutris N et al (2018) Validation of digital visual analog scale pain scoring with a traditional paper-based visual analog scale in adults. J American Acad Ortho Surg Global Resear Rev Mar;2(3):e 088. 10.5435/JAAOS Global-D-17-0008810.5435/JAAOS Global-D-17-00088 PMC 613231330211382 · doi ↗ · pubmed ↗
8Reilly Associates (2010) WPAI: SHP V 2.0. http://www.reillyassociates.net/wpai_shp.html. Accessed 13 May 2024