Ninety‐Seven Percent of Trials Investigating Robotic Interventions in Physiotherapy Contained Abstract Spin: A Meta‐Research Review

Hilary Tier; Jana Verveer; David B. Anderson; Camila Quel De Oliveira; Nicci Bartley; Poonam Mehta; Rafael Z. Pinto; Arianne P. Verhagen; Alana B. McCambridge; Peter W. Stubbs

PMC · DOI:10.1002/cesm.70072·February 12, 2026

Ninety‐Seven Percent of Trials Investigating Robotic Interventions in Physiotherapy Contained Abstract Spin: A Meta‐Research Review

Hilary Tier, Jana Verveer, David B. Anderson, Camila Quel De Oliveira, Nicci Bartley, Poonam Mehta, Rafael Z. Pinto, Arianne P. Verhagen, Alana B. McCambridge, Peter W. Stubbs

PDF

Open Access

TL;DR

This study found that 97% of clinical trial abstracts on robotic interventions in physiotherapy contain misleading information, often omitting negative results or overemphasizing non-significant findings.

Contribution

The study is the first to systematically assess the prevalence of abstract spin in robotic physiotherapy trials, revealing a high rate of misleading reporting.

Findings

01

97% of trials had at least one instance of abstract spin.

02

90% of abstracts failed to mention adverse events.

03

77% overenthusiastically interpreted non-significant primary outcomes.

Abstract

Abstract spin involves misrepresenting or misreporting study findings in the abstract of an article. The abstract is the most easily accessible part of the article and may determine if an article is read, purchased or the findings are implemented into practice. Trials using new technologies, such as robotics, may be particularly vulnerable to spin due to the high costs associated with research and development. To identify and assess abstract spin in physiotherapy clinical trials investigating robotic interventions. Meta‐research review. We searched the Physiotherapy Evidence Database (PEDro) in August 2024 for two‐armed clinical trials investigating robotic interventions compared to nonrobotic interventions, in any patient population. Article screening and data extraction were performed by two people independently. Quality assessment was performed using the PEDro scale with PEDro…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures1

Click any figure to enlarge with its caption.

Tables3

Table 1. Characteristics of the included trials (n = 160).

Characteristics	% or mean (SD)
Patient population
Neurological	95
Orthopaedics	3.1
Burns	1.9
Geographical location of trial
Europe	41.3
Asia	37.5
North America	16.9
Other (Africa, South America, Oceania)	4.4
Publication year
≤ 2010	8.8
2011–2015	27.5
2016–2020	38.1
2021–2024	25.6
Primary outcomes
Defined/undefined	72.5/27.5
Number of primary outcomesa
1	21.9
2–5	40
6–10	20
11–15	10.6
16+	7.5
Primary timepoints
Defined/undefined	64.4/35.6
Number of primary timepointsb
1	60.6
2	26.9
3+	12.5
Registered protocol	45.0
Abstract word count
≤ 175	6.3
176–225	23.8
226–275	35.0
276–325	20.6
≥ 326	14.4
Quality assessment (% Yes)
Eligibility criteria (not scored)	81.9
Randomisation	100.0
Concealed allocation	41.9
Baseline similarities	93.1
Subject blinding	0.0
Therapist blinding	0.0
Assessor blinding	68.1
Adequate follow‐up	64.4
Intention‐to‐treat analysis	35.6
Between‐group comparisons	97.5
Point estimates and variability	94.3
Total PEDro score	6.0 (1.4)

Table 2. Number of trials that scored Yes (spin present), No (spin not present) and Not relevant for each spin item.

Spin item	Yes (spin present) n (%)	No (spin not present) n (%)	Not relevant n (%)
1. Omission of primary outcomes	78a (48.8)	82 (51.3)	N/A
2. Fail to report non‐significant primary outcomes	84a (52.5)	50 (31.3)	26 (16.3)
3. Selective reporting of positive results and omission of negative results of primary outcomes	2 (1.3)	4 (2.5)	154 (96.3)
4. Focus on statistically significant outcomes other than the primary	48 (30.0)	65 (40.6)	47 (29.4)
5. Fail to mention adverse events of the intervention	144 (90.0)	16 (10.0)	N/A
6. Overenthusiastic interpretation of statistically nonsignificant primary outcomes results as effective	123 (76.9)	10 (6.3)	27 (16.9)
7. Recommendation of a treatment without a clinically important effect on the primary outcome	36 (22.5)	1 (0.6)	123 (76.9)

Table 3. Fleiss' κ values and percent agreement between raters for each spin item.

Spin item	Fleiss' κ (95% CI's)	Percent agreement (%)
1. Omission of primary outcomes	0.48 (0.32–0.63)	75.0
2. Fail to report nonsignificant primary outcomes	0.53 (0.37–0.68)	76.3
3. Selective reporting of positive results and omission of negative results of primary outcomes	0.10 (−0.05 to 0.26)	92.5
4. Focus on statistically significant outcomes other than the primary	0.60 (0.45–0.76)	83.1
5. Fail to mention adverse events of the intervention	0.78 (0.62–0.93)	95.6
6. Overenthusiastic interpretation of statistically nonsignificant primary outcomes results as effective	0.35 (0.20–0.51)	72.5
7. Recommendation of a treatment without a clinically important effect on the primary outcome	0.28 (0.12–0.43)	73.8

Keywords

agreementclinical trialsmisreportingmisrepresentationneurologyroboticstechnology

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStroke Rehabilitation and Recovery · Social Robot Interaction and HRI · Traumatic Brain Injury Research

Full text

Background

1

Abstract spin is the misreporting, misrepresentation, and inappropriate extrapolation of a study's findings in an abstract, which omits or fails to faithfully reflect the methods or results and usually provides a more positive interpretation of the study [1, 2]. Spin is common in medical literature [3, 4] with increased attention to spin in recent years [5, 6, 7, 8, 9, 10, 11, 12, 13]. Although spin can occur throughout an article, it is particularly impactful in an abstract. The abstract is often the first part of an article that a reader has access to, readers may choose to read or purchase an article based on the abstract and some people may only read the abstract [14, 15]. Given this, abstracts that contain spin are more likely to have their full texts read and interventions potentially implemented. The importance of spin in an abstract was shown in a study of cancer trials [16]. Three‐hundred clinicians were randomized to read two abstracts, one with spin and the other without, and asked to rate the benefit of the treatment, rigorousness of the methods and interested in reading the full article [16]. Clinicians presented with the abstract with spin rated the treatment as more beneficial, reported methods as being more rigorous and were more likely to want to read the full text [16]. For these reasons, it is important that abstracts fairly reflect the findings of the study so that readers can make informed and appropriate decisions.

There are multiple reasons for abstract spin and findings framed more positively than they are. For academics, the pressure to publish is high with job performance indicators often linked to number of publications and impact factor of the journal of publication [17]. As research with positive results is easier and more likely to be published, academics may feel compelled to present results that are framed more positively [17]. In pharmaceutical research there are often large costs involved in manufacturing, prototyping and initial nonclinical and clinical testing. This means publishing an article with a positive abstract may be preferred to gain interest in the treatment and increase its uptake. For example, when assessing the whole article, pharmaceutical studies funded by for‐profit organizations are more likely to report positive conclusions, due to biased interpretation of study results [18]. When findings are misrepresented or misreported, consumers of research may use or advocate for treatments that don't provide the expected effect, are ineffective, or potentially harmful.

Many studies have investigated spin in biomedical literature and abstract spin has been investigated in multiple clinical areas including health and medical research [19], pharmacology [20], surgery [7, 21, 22, 23, 24, 25, 26], dentistry [27, 28, 29, 30, 31, 32, 33, 34, 35, 36], and physiotherapy [6, 37, 38, 39]. Abstract spin in these areas had considerable variability, ranging from 3% [34] to 98% [6] of included articles. One area that abstract spin has not been investigated is in interventions using robotics. Studies using robotic interventions are particularly vulnerable to abstract spin due to the high costs associated with research and development. The worldwide spending on rehabilitation robots is high, and individual units can cost as much as $500,000 (USD). Given this, there are large incentives to show that these are effective. Cochrane reviews on upper‐ [40] and lower‐limb [41] robotics in stroke populations against physiotherapy or usual care show moderate to high certainty evidence that posttreatment between group differences are only small to moderate for activity of daily living scores (standardized mean difference (SMD 0.31), arm function (SMD 0.32), arm strength (SMD 0.46), 6‐min walk distance (11 m), mean walking velocity (0.05 m/s), and independent walking (odds ratio 1.65). A review investigating only overground robotic exoskeletons showed very‐low certainty evidence for all outcomes [42]. Studies in people with acquired brain injury [43], mixed neurological conditions [44], spinal cord injury [45] and Parkinsons Disease [46] mainly showed low or very‐low certainty evidence, and the few comparisons with moderate to high certainty evidence showed small effect sizes. Given the small effect sizes and uncertain evidence, as well as the cost of the devices to hospitals and research institutions, there is a high risk that these studies may want to oversell the results to make an intervention appear more effective.

Given this, it is important to investigate spin in studies using robotic interventions as this has not yet been investigated, and there is a high possibility of abstract spin. For this article we investigated abstract spin using the 7‐item checklist developed by Nascimento et al. [6] informed by the widely referenced guidelines developed by Boutron [3]. In our protocol article, we provided expanded descriptions of each item from that checklist to reduce the ambiguity of scoring [47]. For the current study, we have chosen to use this checklist with updated definitions [47]. The primary aim of the current study was to assess the amount and type of abstract spin in physiotherapy robotics clinical trials indexed in PEDro. The secondary aims are to provide rater agreement in using this spin checklist and explore potential factors related to spin.

Methods

2

Design

2.1

This study has a meta‐research design using guidance from the Preferred Reporting Items for Systematic reviews and Meta‐Analyses (PRISMA) guidelines [48]. The protocol was published on October 19, 2020 [47].

Search Strategy

2.2

The search was performed in the Physiotherapy Evidence database (PEDro) and has been described previously [47]. PEDro indexes clinical trials, systematic reviews, and clinical practice guidelines within the scope (or future scope) of physiotherapy practice [49]. PEDro is updated using monthly searches of Medline, Embase, CINAHL, CENTRAL AMED, and PsycINFO and articles are included in the database regardless of registration status, journal, language or methodological quality [49, 50, 51]. Included articles were clinical trials indexed in PEDro. Searches were performed using the “clinical trials” filter, using terms searching the titles and abstracts related to robotics. These were “robot*’” “Exoskel*,” “Electro mechanic*,” “Electromechanic*,” “automat*,” “orthotic*,” “orthos*,” “driven,” “computer aided” and “computer assist.” The search was performed on August 2, 2024.

Selection Criteria

2.3

Included trials were two‐armed randomized trials written in English investigating a robotic intervention (alone or with another nonrobotic intervention) compared to a non‐robotic intervention. We excluded multi‐armed trials to ensure that the trials were as comparable as possible as trials with more arms would have more primary comparisons which may increase the likelihood of spin. We defined a robotic intervention as “a machine capable of carrying out a complex series of actions automatically” [47, 52, p. 2]. Some of examples of devices that are considered robotics are the Lokomat [53], Ekso‐GT [54], Morning Walk [55], Hybrid Assistive Limb [56], Bi‐Manu‐Track [57], InMotion‐1 and ‐2 (MIT‐MANUS) [58, 59], Armeo Spring [60], and Neuro‐Rehabilitation‐Robot (NeReBot) [61]. Some examples of interventions that are not considered robotics are the SaeboFlex [62], wearable exoskeleton stride management assist system (SMA) [63], Trunk stabilization training robot (3DBT‐33) [64], SMART Arm [65], Neurocom Pro‐Balance Master [66], Therasuit [67], LOCOBOT [68], Hunova [69], Motorized Ankle Stretcher [70], AnkleMotus [71], and experimental devices used to elicit stretch reflexes [72, 73, 74]. Trials could investigate any population, outcome or timepoint. Excluded studies were pseudo‐randomized studies, nonrandomized studies, cross‐over trials, cluster trials or studies that had robotic interventions in both study arms.

Study Selection

2.4

Screening of the title/abstract and full text for eligible trials were independently performed against the inclusion/exclusion criteria by two raters in Covidence. Disagreements were discussed until consensus was reached.

Data Extraction

2.5

All included trials underwent spin assessment and data extraction. Data were extracted by two authors independently using a custom‐made data extraction form in Excel (Version 16.70 (Build 23021201)). Disagreements in data extraction were resolved through discussion. Extracted data were the number of authors, publication year, patient population, geographical location, abstract length, number of randomized patients, number of primary outcomes and timepoints, funding and funding type, conflict of interest statement, and if a conflict of interest was reported, journal impact factor and trial preregistration.

Quality Assessment

2.6

Trial quality was rated using the PEDro quality assessment scale [75, 76]. The PEDro scale contains 11 items, of which 10 are scored. The first item relates to external validity and is not scored. Items 2–9 related to the internal validity with questions on randomization, concealment, baseline similarities, blinding (patient, therapist and assessor), loss to follow‐up and intention to treat analysis. Items 10 and 11 relate to statistical reporting and includes reporting of point estimates and variability and between‐group statistical comparisons. Scores are summed from 0 (no items met) to 10 (all items met). In our study, we used the ratings (from 0 to 10) provided on the PEDro website. These have been rated by two experienced raters and verified by a third rater.

Assessment of Spin

2.7

We used the spin‐item categories defined previously [6] with updated item descriptions [47] (Appendix 2). Item descriptions were expanded to reduce ambiguity of the initial descriptions [47]. The items were omission of primary outcomes (item 1), failing to report between‐group non‐significant primary outcomes (item 2), selectively reporting of positive results and omission of negative between‐group results of primary outcomes (item 3), focussing on statistically significant outcomes other than the primary (item 4), failing to mention adverse events of the intervention (item 5), overenthusiastically interpreting statistically nonsignificant primary outcomes results as effective (item 6), recommending a treatment without a clinically important effect on the primary outcome (item 7) [13, 41]. Items 1 and 5 were rated “Yes” or “No” with all other items rated “Yes,” “No” or “Not Relevant.”

Raters assessed if the abstract demonstrated spin in corroboration with the full text, with primary outcomes/timepoints being determined from an investigation of the full text. For spin item 1, the same primary outcomes/timepoints also needed to be mentioned in the abstract. If between group primary outcomes/timepoints were non‐significant (item 2) or negative (item 3) in the text, these needed to be presented in the abstract. When the full‐text did not provide between‐group differences, these were calculated using the mean, SD and number in each group using a calculator to determine the mean difference and 95% confidence intervals [77]. For item 4, the between‐group primary outcome/timepoint needed to be omitted in favor of a secondary outcome (that didn't need to be mentioned in the full‐text). For the measurement of adverse events, if stated in the abstract, these needed to align with the adverse events mentioned in the full‐text. For item 6, when nonsignificant between group differences of primary outcomes had been determined in the full‐text, abstracts could provide overenthusiastic reporting of within‐group differences or used statements implying significance, whether mentioned in the full‐text or not. For determination of the smallest clinically worthwhile effect (SCWE), we used the SCWE provided in the full‐text. When this was not provided, we used 15% of the scale range when a scale was used or a 15% difference between measures of the postintervention value of the intervention group for continuous measures.

If the abstract or full‐text of a trial did not nominate a primary outcome all outcomes were deemed primary. If the abstract or full‐text of a trial did not nominate a primary timepoint, all measurement timepoints were deemed primary. If the abstract or full‐text of the trial only measured one outcome or one timepoint, that outcome or timepoint was deemed primary.

Two authors independently rated the spin items using a custom‐made data extraction form in Qualtrics. If initial ratings did not agree, consensus was attained through discussion and, if required, through consultation with a third author.

Data Analysis

2.8

Data on included trial characteristics were tabulated using counts, percentages and means (standard deviation), where appropriate. The number of trials that used spin were tabulated for each spin item and Yes, No and Not relevant responses were summarized as counts and percentages.

Agreement in spin ratings: Agreement in spin ratings was compared between authors using Fleiss' kappa for items with “Yes” and “No” responses only (items 1 and 5) or “Yes,” “Not Yes” (No and Not Relevant) (items 2–4, 6 and 7) as per the protocol. Kappa values were interpreted as slight (0–0.20), minimal (0.21–0.39), weak (0.40–0.59), moderate (0.60–0.79), strong (0.80–0.90), and almost perfect (> 0.90) agreement [78]. Fleiss' κ values were tabulated as the κ value (95% Confidence intervals) with the percentage agreement.

Exploratory analyses: One‐way analysis of variance (ANOVA), unpaired t‐tests and linear regression analyses were performed to assess potential differences/relationships between independent variables and the amount of spin. For all tests, the amount of spin was the dependent variable (on a 0–7 scale). Independent variables assessed using one‐way ANOVA were continent (North America, Asia, Europe, Other), Funding reported (Yes, No, N/A [i.e., No funding section]), whether the trial declared a Conflict of Interest (COI) (Yes, No, N/A (No COI section)), whether the trial was linked to industry through COI or funding (Yes, No, unknown), journal 2‐year citations per document (from Scimago Journal and Country Rank) (no data and 0–1, 1–2, 2–5, > 5) and abstract word length (≤ 175, 176–225, 226–275, 276–325, ≥ 326). ANOVA results were presented as the F‐value with degrees of freedom and p value. If ANOVA were significant, post hoc t‐tests were performed to identify the differences between groups. Independent variables assessed using unpaired t‐tests were protocol registration (Yes, No), area of physiotherapy (Neurology, Other) and whether the primary outcome was defined (Yes, No). These were presented as mean differences (MD) and 95% confidence intervals (95% CI). Independent variables assessed using linear regression analyses were PEDro score (1–10) and year of publication. These were presented as unstandardized beta coefficients (unstandardized β) and 95% CI.

Deviations From the Protocol

2.9

We intended to choose a selection of 100 randomized clinical trials. However, as the number of trials was only 160, we decided to assess all trials. We intended to include quasi‐randomized trials but changed the inclusion criteria to only include randomized trials. This was to ensure the included trials were as rigorous as possible. The spin checklist was piloted by spin raters by screening five trials not included in the final selection. Item 7 required a SCWE to determine if authors used spin in their clinical interpretation of the findings. If a SCWE was reported for primary outcomes in the manuscript, this was used. Initially, when this was not reported in the primary manuscript, authors searched the available literature for the SCWE. However, in practice screening, 0/5 trials reported a SCWE, so we stopped searching for the SCWE in the literature as (1) it became too time‐consuming (i.e., One trial had 20+ primary outcomes), (2) the SCWE often did not exist for the outcome measures relating to many of the populations of interest, severity of condition and phase postinjury, and (3) there were often sometimes multiple (different) SCWE estimates. As such, we developed an arbitrary rule that the SCWE was 15% of the scale range when a scale was used or a 15% difference between measures of the postintervention value of the intervention group for continuous measures. In the protocol, we intended to compare spin of experienced and inexperienced raters. Data detailing inexperienced raters were presented elsewhere [79].

Results

3

Search Results

3.1

We retrieved 1384 records in PEDro, screened 1381 titles and abstracts, and examined 299 full texts. After examining full texts, 160 trials were included. Figure 1 shows the flow of studies through the screening process.

Flow of studies through the review.

Characteristics of Included Trials

3.2

Trials were mainly performed with neurological populations (95%) and were from Europe (41.3%), Asia (37.5%), and North America (17.9%) (Table 1). The mean (SD) PEDro score was 6.0 (1.4) and ranged from 2 to 8, with 61% of trials being of high quality. All trials were randomized and no trials blinded participants or therapists. Concealed allocation (41.9%) and intention‐to‐treat analysis (35.6%) were the items that were least met. Most trials defined primary outcomes (72.5%) or timepoints (63.5%), had 1 (21.9%), 2–5 (40.0%), or 6–10 (20.0%) primary outcomes and 1 (60.6%) or 2 (26.9%) primary timepoints. Less than half of trials (45.0%) had registered protocols. Abstract word counts were mainly 176–225 (23.8%), 226–275 (35.0%), and 276–325 (20.6%) words.

Assessment of Spin

3.3

The most common form of spin was failing to mention adverse events (90.0%) followed by overenthusiastic interpretation of nonsignificant primary outcomes as effective (76.9%) (Table 2). For spin item 3, most trials (96.3%) did not have any negative primary outcomes and were scored not relevant. The amount of positive spin items in each trial ranged from 0 to 6; 0 items (6 trials, 3.1%), 1 item (25 trials, 15.6%), 2 items (34 trials, 21.3%), 3 items (17 trials, 10.6%), 4 items (37 trials, 23.1%), 5 items (29 trials, 18.1%), 6 items (13 trials, 8.1%), and 7 items (0 trials, 0%).

Rater Agreement

3.4

Percent agreement between raters ranged from 72.5% to 95.6%. Fleiss' κ values were moderate (items 4 and 5), weak (items 1 and 2), minimal (items 6 and 7) and slight agreement that was no better than chance (item 3) (Table 3).

Exploratory Analysis

3.5

One‐way ANOVA showed no significant between group differences for geographical location (F (3, 156) = 0.45, p = 0.99), abstract word count (F (4, 155) = 1.32, p = 0.26), journal 2‐year citation/document (F (4, 155) = 0.63, p = 0.64), whether the trial was funded (F (2, 157) = 0.74, p = 0.49), whether the trial had a COI (F (2, 157) = 0.87, p = 0.42), whether the trial had an industry COI (F (2, 157) = 0.29, p = 0.75). Unpaired t‐tests showed no difference in the amount of spin between trials with and without a registered protocol (MD: 0.39, 95% CI: −0.14 to 0.91), with or without a defined primary outcome (MD: 0.145, 95% CI −0.44 to 0.73) and between trials in neurological physiotherapy and Other areas (MD: 0.23, 95% CI: −0.97 to 1.42). Regression analyses showed that the amount of spin was not associated with year of publication (unstandardized β: −0.02, 95% CI −0.08 to 0.04). The amount of spin was significantly associated with trial quality (unstandardized β: −0.22, 95% CI −0.41 to −0.03), with higher quality trials having lower spin.

Discussion

4

The current study found spin in most abstracts of physiotherapy trials investigating robotics with 97% of trials containing at least one item of spin. Most commonly, authors failed to report adverse events and provided an overenthusiastic interpretation of nonsignificant results. Approximately half of the trials focussed on the significant results of their primary outcomes and failed to report non‐significant primary outcomes. Over one‐fifth of trials recommended a treatment without a clinically important effect on the primary outcome, however most did not make any clinical recommendation. Exploratory analyses showed that the amount of spin in abstracts was weakly related to trial quality and not related to abstract word count, geographical location, protocol registration, publication year, COI or funding.

Comparison With Other Literature

4.1

Non‐reporting of adverse events was the most common example of abstract spin. This is comparable with trials in physiotherapy for low back pain that also found that non‐reporting of adverse events was high (93.5%) [6]. Adverse events are frequently underreported in non‐medical intervention trials [47]. Nonreporting of the adverse events in the abstract (and often the full‐text) in robotics trials could due to minor nature of the adverse events experienced. As adverse events are poorly reported, it is difficult gauge the impact of the adverse events experienced although conceivably these could include minor events such as skin irritation, redness, abrasions, skin break down, fear of falling, motion sickness and general discomfort or pain. Moderate events could include falls, muscle tears, shoulder subluxation. Although the events may have been minor, or not happened at all, these should be reported in the abstract so readers can make an informed treatment decision or provide a patient with an informed choice to use or not use the intervention. Even if minor, adverse events should still be reported in the abstract or a statement that no adverse events occurred is needed, as information on the safety of an intervention is crucial for the implementation of the intervention. Providing this in the abstract provides transparency which is why reporting of adverse events is recommended in the Consolidated Standards of Reporting Trials (CONSORT) statement [80, 81]. Although the authors intention may not be to mislead the reader, it is important to report such issues to ensure transparency for all consumers of the literature.

Overenthusiastic interpretation of statistically non‐significant primary outcomes was another commonly scored item. In comparison to previous literature, this was more common in robotic trials than in trials in low back pain (61.5%) [6], traumatic brain injury (60.6%) [38] and physiotherapy (73%) [37]. This was mainly due to conclusions on the effectiveness of interventions being based on within‐group changes, when between‐group differences were nonsignificant. This is not surprising as many reviews have highlighted that robotic interventions have indifferent or small treatment effects. Although the intentions of the authors cannot be assumed, using within‐group comparisons makes the robotics interventions appear effective when they are not. This overstates the effectiveness of the interventions. It is important that these trials report between‐group comparisons to ensure that placebo effects, Hawthorne effects, regression to the mean and natural recovery are controlled for. As such, interpreting treatment effectiveness from within‐group comparisons is inappropriate [82] and demonstrates poor statistical/methodological rigour.

Reporting of any negative findings (i.e., favoring the control over the robotic intervention) was limited, with only 2/6 trials with negative findings reporting negative findings in the abstract and 154 trials not reporting any negative findings. Such an overwhelmingly positive slant could suggest selective reporting of positive outcomes or selective publication of trials with positive results [83, 84]. Although assessing the content and accuracy of protocols were beyond the scope of this study, 55% of trials did not refer to any protocol. This means that outcomes could be selected or changed, without any a priori record, as is commonly done in physiotherapy literature [85, 86]. As a result, negative outcomes in robotic literature are likely more common but the extent of this remains unknown.

More than one in five robotics trials recommended a treatment without a clinically meaningful effect of the primary outcome. This is concerning as many trials based their recommendation of treatment on statistical significance rather than clinical significance. This is not unique to robotics trials, but the potential implications and equipment required for people wishing to use the interventions, compared to other physiotherapy interventions, are more costly. If presented appropriately, the interventions will appear less effective or ineffective, and may not be used further. Further, readers may have unrealistic expectations of treatment effectiveness which may result in unjustified optimism of the treatment for the clinician and patient. In 2022, the International Society of Physiotherapy Editors endorsed the use of an estimation‐based approach [87, 88] as well as reporting effect sizes and 95% CI relative to the smallest clinically worthwhile effect. As most of the trials in the current review were published prior to 2022 and many articles were published in journals that were not part of the joint editorial, it is unrealistic to expect that any of the trials will have followed this advice. However, it would help the interpretation of the clinical results in abstracts. To aid in the interpretation of trials relative to the SCWE, authors are encouraged to use guidelines proposed by Kamper [89] or Herbert et al. [90].

The exploratory analyses showed that most investigated factors did not explain the amount of spin in a trial. Some trials investigating the relationship between factors such as journal impact factor [91], trial quality [6], abstract word count [6], trial registration [32], multicentre trials [6], and amount of international collaboration [32] have shown significant relationships while others have shown no relationships for any variable [4, 7, 92]. The only significant exploratory test was that spin was associated with trial quality. The effect was very small as for a 7‐point increase in spin (the scale range), the PEDro quality would reduce by 1.61 points. Although studies, including ours, have shown that that high spin is related to total lower trial quality [6], these findings are not universal [93].

User Agreement of the Spin Checklist With Recommendations

4.2

Kappa and agreement values using previous versions of this checklist have been minimal to weak [6]. We created and updated item definitions and performed five calibrations of abstract spin ratings on diverse trials prior to rating abstracts in an attempt to improve user agreement. However, the kappa values remained minimal or weak (4 items) or no better than chance (1 item). A reason for poor agreement was likely due to the difficulty in identifying the primary outcome/timepoint when studies did not define a primary outcome/timepoint. If the primary outcome/timepoints were identified incorrectly, spin items 1–4, 6, and 7 could be scored incorrectly. Seventy‐three percent of studies did not report primary outcomes, and 64% of studies did not report primary timepoints (when there was more than one outcome/timepoint). Some studies, such as those investigating gait parameters had 20+ outcomes, without defining a primary outcome. We would recommend adjusting the checklist so that if the primary outcome is not defined, then the using the first mentioned outcome in the methods section. In studies with more than 1 timepoint, when a primary timepoint has not been defined, then the timepoint immediately after the intervention could be used. These changes would reduce the disagreements because of the misidentification of primary outcomes.

For spin item 3 “…omission of negative results of primary outcomes” raters sometimes confused “negative” results and “non‐significant” results. For this item, having a highlighted explanation of what “negative” means within the checklist would be helpful. It is also important to ensure that practice trials are purposely selected so that at least one practice trial reports a negative result.

The direction of a significantly “better” outcome measure was sometimes unclear or misinterpreted (i.e., higher scores were considered better, when lower scores were better). This was particularly problematic for studies on gait parameters or studies with unfamiliar outcome measures. Ideally, prior to assessing articles for spin, authors could have an “outcome measures” spreadsheet, with the direction of positive findings. This can be added to if a new outcome measure is encountered.

For spin item 6, the overenthusiastic interpretation of statistically nonsignificant items as effective, could have been broken into its multiple parts such as “using with group differences” and “using verbiage to imply significance.” This would provide a more discriminative assessment of this item.

For spin item 7, ambiguous recommendations were challenging to score with authors using vague, positively slanted language such as “…seems promising in gait rehabilitation…may be useful to plan highly patient‐tailored gait rehabilitation protocols…” [94] or “…is a treatment option….” [95]. This ambiguity in recommendations was frequent and resulted in disagreements between raters. Having strict a priori defined rules about what constitutes a recommendation, including multiple examples would improve agreement.

Further, guidance documents should be created. These documents should account for more scenarios and provide examples for these scenarios.

Strengths and Limitations

4.3

The strength of our study is that we assessed spin in all physiotherapy related trials in robotics. The spin checklist was refined and trialled using multiple raters. Each rater was provided five calibration trials to ensure familiarity with the spin checklist. Although there were some deviations from protocol, these have been transparently reported and justified.

There were also some limitations. We were perhaps overly stringent in applying the spin checklist. For example, many trials did not specify the primary timepoint, used multiple analyses (i.e., between group analyses and regression analyses), or analyzed the items of a scale instead of the total scale. In these instances, the abstracts needed to report the results of all timepoints/analyses or provide an appropriate summary to be scored as having no spin. The overreliance on the primary outcome in the current checklist is potentially problematic especially when 28% of trials did not define a primary outcome. In trials that didn't report a primary outcome, all outcomes were deemed primary. This meant that these trials had many primary outcomes, and it would be very difficult to report all outcomes in the abstract. Sometimes in the same trial, some primary analyses were (clinically) significant and some were non‐significant, yet a positive clinical recommendation was made. These were deemed as having spin, although this could be seen as being overly stringent. Ideally, we would like authors to identify their primary outcomes and comment on the uncertainty of findings for the results of primary outcomes. We only included two‐armed trials and excluded multiarmed trials. Our rationale was to ensure that the trials were as comparable as possible. Trials with more arms would have more primary comparisons which may increase the likelihood of spin given the limited space in the abstract. We only searched the PEDro database which could be seen as a limitation as we may have potential missed trials. Despite this, the PEDro database is a comprehensive overview of physiotherapy literature and has significant overlap of indexed physiotherapy trials with CENTRAL, EMBASE and PubMed [96]. The PEDro database indexes trials on physiotherapy interventions or interventions that could become physiotherapy interventions in the future, regardless of registration status, journal, language, or methodological quality [49, 50]. Further, PEDro is updated with monthly searches of Medline, Embase, CINAHL, CENTRAL AMED, and PsycINFO with citation tracking and notification of new studies from individuals (such as academics or clinicians) [50, 51]. Although the PEDro scale has an “advanced search option,” it lacks Medical Subject Headings or the ability to search using complex search strings which may limit search sensitivity.

Future Directions

4.4

Ninety‐seven percent of studies in robotics articles contain spin. This means that many robotics abstracts are misrepresenting the articles that they are summarizing. This is concerning as the treatments are being misrepresented which impacts reader perception of treatment effectiveness and could determine if people access a study. There needs to be more discussion of abstract spin in robotics, and its potential consequences, through discussions in conferences, editorials, letters, online forums and education of all article stakeholders. Given the cost of many robotic interventions, it is important that these are presented fairly in the abstract.

Currently, there is no accepted spin checklist for monitoring abstract spin. Our study has contributed to a checklist by trialing a modified checklist and providing suggestions for a future checklist. Standardization of an abstract spin checklist would aid in the comparability between studies investigating abstract spin, and an endorsement by CONSORT or the International Society of Physiotherapy Editors would increase the visibility of abstract spin in the literature. A uniform and widely accepted spin checklist would be useful for authors who write abstracts, journal editors who are the gatekeepers to articles being published and consumers who can read an abstract and identify spin. Given this, eliminating spin or ensuring that people are educated about spin is important to reduce the potential impacts of spin.

Conclusion

5

Ninety‐seven percent physiotherapy trials investigating robotic interventions contained spin. We caution consumers of robotics research to engage thoroughly and critically with articles in this field of literature. Most commonly authors failed to report adverse events and overenthusiastically interpreted statistically non‐significant primary outcomes. The inter‐rater agreement of the checklist ranged from slight agreement that was no better than chance to moderate agreement. We would recommend further refining the definitions of each checklist item.

Author Contributions

Hilary Tier: investigation, writing – review and editing. Jana Verveer: investigation, writing – review and editing. David B. Anderson: investigation, writing – review and editing. Camila Quel De Oliveira: investigation, writing – review and editing. Nicci Bartley: investigation, writing – review and editing, methodology. Poonam Mehta: conceptualization, investigation, writing – review and editing, methodology. Rafael Z. Pinto: conceptualization, investigation, methodology, writing – review and editing. Arianne P. Verhagen: conceptualization, investigation, writing – review and editing, methodology. Alana B. McCambridge: conceptualization, investigation, writing – review and editing, methodology. Peter W. Stubbs: conceptualization, investigation, writing – original draft, visualization, validation, writing – review and editing, project administration, supervision, formal analysis, methodology.

Funding

The authors received no specific funding for this work.

Ethics Statement

The authors have nothing to report.

Consent

The authors have nothing to report.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Supplmentary dataset SUBMIT.

Declaration+of+Interest FORM.

Bibliography96

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1J. Bailar , “How to Distort the Scientific Record Without Actually Lying: Truth, and the Arts of Science,” European Journal of Oncology 11, no. 4 (2007): 217–224.
2I. Boutron and P. Ravaud , “Misrepresentation and Distortion of Research in Biomedical Literature,” Proceedings of the National Academy of Sciences 115, no. 11 (2018): 2613–2619, 10.1073/pnas.1710755115.PMC 585651029531025 · doi ↗ · pubmed ↗
3I. Boutron , “Reporting and Interpretation of Randomized Controlled Trials With Statistically Nonsignificant Results for Primary Outcomes,” Journal of the American Medical Association 303, no. 20 (2010): 2058, 10.1001/jama.2010.651.20501928 · doi ↗ · pubmed ↗
4K. Chiu , Q. Grundy , and L. Bero , “‘Spin’ in Published Biomedical Literature: A Methodological Systematic Review,” P Lo S Biology 15 9 (2017): e 2002173, 10.1371/journal.pbio.2002173.28892482 PMC 5593172 · doi ↗ · pubmed ↗
5D. P. Nascimento , R. W. J. G. Ostelo , M. W. van Tulder , et al., “Do Not Make Clinical Decisions Based on Abstracts of Healthcare Research: A Systematic Review,” Journal of Clinical Epidemiology 135 (2021): 136–157, 10.1016/j.jclinepi.2021.03.030.33839242 · doi ↗ · pubmed ↗
6D. P. Nascimento , L. O. P. Costa , G. Z. Gonzalez , C. G. Maher , and A. M. Moseley , “Abstracts of Low Back Pain Trials Are Poorly Reported, Contain Spin of Information, and Are Inconsistent With the Full Text: An Overview Study,” Archives of Physical Medicine and Rehabilitation 100, no. 10 (2019): 1976–1985.e 18, 10.1016/j.apmr.2019.03.024.31207219 · doi ↗ · pubmed ↗
7N. Rassy , C. Rives‐Lange , C. Carette , et al., “Spin Occurs in Bariatric Surgery Randomized Controlled Trials With a Statistically Nonsignificant Primary Outcome: A Systematic Review,” Journal of Clinical Epidemiology 139 (2021): 87–95, 10.1016/j.jclinepi.2021.05.004.34004338 · doi ↗ · pubmed ↗
8P. W. Stubbs , A. P. Verhagen , and A. B. Mc Cambridge , “Letter to the Editor Regarding “Does Vitamin C Supplementation Improve Rotator Cuff Healing? A Preliminary Study,” European Journal of Orthopaedic Surgery & Traumatology 32, no. 4 (2022): 785–786, 10.1007/s 00590-021-03040-x.34117529 · doi ↗ · pubmed ↗