Measurement properties of the Inflammatory Rasch-built Overall Disability Scale (I-RODS) in patients with Guillain–Barré syndrome
Farah Pelouto, Nowshin Papri, Juanita A. Haagsma, David R. Cornblath, Eduardo Nobile-Orazio, Eveline J. A. Wiegers, Thomas Harbo, Yusuf A. Rajabally, Pieter A. van Doorn, Kenneth C. Gorson, Caroline B. Terwee, Bart C. Jacobs, Mike C. Horton, G. Antonini, G. Antonini, S. Arends

TL;DR
This study evaluates the reliability and effectiveness of a disability scale for Guillain–Barré syndrome patients.
Contribution
The study validates the I-RODS scale for GBS using Rasch-based methods and identifies areas for improvement.
Findings
The I-RODS showed good internal consistency and construct validity but poor unidimensionality.
Significant differential item functioning was observed by geographic region.
The scale had a skew towards floor and ceiling effects, suggesting room for improvement.
Abstract
Guillain–Barré syndrome (GBS) is an acute immune-mediated polyneuropathy with a high disease impact, even after good clinical recovery. The Inflammatory Rasch-built Overall Disability Scale (I-RODS) is a Patient-Reported Outcome Measure (PROM) developed for patients with immune-mediated neuropathies that measures limitations in daily activities and social participation. It consists of 24 items, scored from 0–48. The present study aimed to validate the measurement properties of the I-RODS in patients with GBS included in the prospective International GBS Outcome Study (IGOS). The current study focussed on structural validity, cross-cultural validity, internal consistency, and construct validity of I-RODS using Rasch-based methods. The study was conducted in 1226 patients diagnosed with GBS with a median I-RODS score of 28 (IQR 10–41) 4 weeks after inclusion into IGOS. Rasch analyses…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —http://dx.doi.org/10.13039/501100004243Prinses Beatrix Spierfonds
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeripheral Neuropathies and Disorders · Multiple Sclerosis Research Studies · Long-Term Effects of COVID-19
Introduction
Guillain–Barré syndrome (GBS) is an immune-mediated polyneuropathy, characterised by a rapidly progressive weakness, sensory deficits, autonomic dysfunction and a typical monophasic disease course [1, 2]. Effective treatments in the acute stage of GBS are intravenous immunoglobulins (IVIg) or plasma exchange (PE), but the clinical response to treatment is highly variable between patients [3]. Patients may show a rapid and full recovery, whilst others show no sign of improvement and require ventilation for months. Most patients require rehabilitation, and a considerable proportion have persistent motor or sensory impairment [4]. In addition to neurological deficits, GBS can exert a profound impact, placing a substantial physical, emotional, and social burden on affected individuals. Residual fatigue and pain are prevalent symptoms of GBS, significantly impacting daily functioning [5]. Severe fatigue in these conditions can substantially limit physical and cognitive abilities and in turn compromising societal participation (e.g. ability to work) and overall quality of life [6, 7] The assessment of GBS during its course is primarily based on clinical outcome measures, including the Medical Research Council Sum Score (MRC sum score) and the GBS Disability Scale (GBS-DS). Patient-Reported Outcome Measures (PROMs) are essential to determine the impact of disease from the patients’ perspective by capturing the outcomes that matter most to patients [8]. Moreover, PROMs are important to monitor disease progression and to evaluate the effect of treatment in clinical practice and trials [9]. The Inflammatory Rasch-built Overall Disability Scale (I-RODS) is one of the first PROMs that is frequently used in clinical practice and in treatment trials in GBS [10, 11]. The I-RODS was designed to assess overall disability (i.e. activity and social participation limitations) and was constructed through a Rasch analytic approach [12]. It was developed in the Netherlands in the Dutch language and consists of items that evaluate aspects of disability, summarised into one total score [13]. Items were developed based on the International Classification of Functioning, Disability and Health (ICF) and other scales to reflect limitation in activity and social participation. However, patients were not directly involved in every stage of the development process. Other potential limitations of the I-RODS are that the scale was initially developed for all types of immune-mediated polyneuropathies and not specifically for GBS and largely in patients from The Netherlands potentially restricting its applicability for international use due to cultural and linguistic variations. This is supported by a previous study, revealing item bias across different regions [11]. To date, I-RODS has not undergone rigorous validation in independent cohorts of patients with GBS. The few studies that have assessed its measurement properties focussed mainly on responsiveness [14] or were based on the same cohort of patients [10, 11, 15].
The aim of the current validation study was to assess structural validity, cross-cultural validity, internal consistency, reliability, construct validity, and floor and ceiling effects of the I-RODS in an international prospective cohort of patients with GBS from various geographic regions.
Methods
Study design
This study used data from the International GBS Outcome Study (IGOS), a prospective observational cohort study in which all patients diagnosed with GBS within two weeks of onset of weakness could be included, independent of age, clinical variant, and severity or treatment [16]. Patients were included from the following countries: Argentina, Australia, Bangladesh, Belgium, Canada, China, Denmark, France, Germany, Greece, Italy, Japan, Malaysia, Netherlands, South-Africa, Spain, Switzerland, Taiwan, United Kingdom, and United States of America. Clinical features, diagnostic findings, treatment, disease course, and outcome were collected. The I-RODS was administered at four weeks after inclusion in the cohort, a clinically relevant time point when all patients have reached clinical nadir, which is also frequently used to evaluate primary endpoints in clinical trials [17]. During this period, patients present with considerable heterogeneity in clinical outcomes, which makes this an appropriate time to evaluate disease burden and functional status.
I-RODS
The I-RODS is a PROM that assesses limitations in activity and social participation in patients with immune-mediated neuropathies. The I-RODS consists of 24 items reflecting limitations in daily activities, with response categories of 0 (“Not possible”), 1 (“Possible with effort”), and 2 (“Easy to perform”). Examples of activities are: “Wash your lower body” and “Turn a key in a lock”. The total score ranges from 0 to 48, with lower scores indicating a higher degree of patient-reported disability [13]. The I-RODS was originally developed in English and officially translated (and back translated) into Chinese, Danish, Dutch, English, French, German, Italian, Japanese, and Spanish. It was administered through paper and pencil by a treating physician. In Bangladesh, no official Bangla translation of the I-RODS was available and due to limited literacy of the patients, physicians verbally administered the questionnaire by providing item-by-item explanations in Bangla and documenting the patients’ responses on their behalf. The I-RODS version used in this study can be found in Supplementary Table 1.
Statistical analysis
Descriptive statistics of the baseline characteristics of the study population were performed using SPSS (version 28, IBM Corp., Armonk, NY, USA). Definitions of the measurement properties; content validity, structural validity, cross-cultural validity, internal consistency, construct validity, and floor and ceiling effects are presented in Supplementary Table 2. The structural validity of the I-RODS was assessed by Rasch analysis using RUMM2030Plus (RUMM Laboratory, Perth, Australia). Confirmatory Factor Analysis was performed in R, using the lavaan package [18]. A summary of the psychometric properties, analyses and criteria is presented in Table 1.Table 1. Psychometric properties, analyses and criteriaMeasurement propertyAnalysisDescription of analysisIndices/indicatorsCriteria good propertyStructural validity UnidimensionalitySeries of t–tests(Rasch analysis)Independent sets of items (8v8 for I-RODS) are used to generate separate person estimates for each individual. These person estimates are then compared to determine whether they are equivalentPercentage of significant t-testsLower 95% binomial CI </= 0.05Confirmatory factor analysis (CFA)Factorial structure examination, if all items load on a single factor (construct) unidimenstionality can be assumedChi-square p-value, Scaled Comparative Fit Index (CFI), root mean square error of approximation(RMSEA), Goodness of Fit Index (GFI), Tucker-Lewis Index (TLI) and standardised root mean square residual error (SRMR)Non-significant chi-square p-valueCFI > 0.95RMSEA < 0.06GFI > 0.95TLI > 0.95SRMR < 0.08 Local independenceRasch analysisThe level of independence of responses to items, such that the measurement is solely focussed on the underlying construct being measured is determined by examining the residual correlation matrix. After controlling for the dominant factor, there should be no important covariance amongst item responsesResidual (Pearson) correlation values (Q3 values) < 0.20 above the average correlation Rasch model fit Overall fitRasch analysisA summation of the X^2^ values from the individual items. Can be used as a general indication of the magnitude of fit across the item setX^2^, p-value > 0.01 Person fitThe degree to which a person’s response pattern reflects their overall scoreZ-standardised fit residual ± 3 Item fitThe degree to which an item works with the rest of the item set to measure an underlying construct. Item anomalies are indicated by the degree of misfitZ-standardised fit residual; X^2^ p-value ± 2.5; > 0.01Response categoryThreshold orderingThreshold map and category probability curvesAll response categories orderedInternal consistencyReliability analysisThe inter-relatedness of the items in a scale and, the ability of the measure to reliably order the individuals who are being measuredPerson Separation Index (PSI) (Rasch)Cronbach’s alpha (α) > 0.85 = excellent > 0.70Cross-cultural validityDifferential item Functioning (DIF) analysis with Analysis of Variance (ANOVA)People from different groups with the same level of the construct should have the same probability of giving a certain response to an item. If these probabilities are not the same, there is DIFp-value > 0.05Construct validityCorrelation analysisExamining the relationship between the I-RODS scores and a clinical outcome measureSpearman correlation coefficient (R) > 0.50Floor and ceiling effectsDescriptive statistical analysesPercentages of patients with the lowest and highest possible scores indicate ineffective differentiation amongst individuals with different construct levelsPercentages (%) < 15% TargetingRasch analysisComparing the ‘ability’ distribution of individuals with the ‘difficulty’ distribution of the items to see if the instrument aligns with the population estimates of the underlying construct being measuredPerson-item threshold distributionBell curve
Structural validity
Confirmatory factor analysis
Prior to applying Rasch analysis to assess the fit of the model, we examined whether the data met the core assumption of unidimensionality, which implies that scores on each item can be summed into one total score to determine the impact of GBS amongst patients. We hypothesised that a one-factor confirmatory factor analysis will reveal that all items measure the same underlying construct, i.e., the outcome that the PROM intends to measure (in this case limitations in activity and social participation). It is assessed through multiple items that represent the outcome [19]. This implies that scores on each item can be summed into one total score to determine the impact of GBS amongst patients.
Rasch analysis
The I-RODS was developed using Rasch methods, and therefore we evaluated whether the data fits the Rasch model in an independent validation sample of patients with GBS enrolled in IGOS, including patients from diverse geographic regions. To examine whether all items were contributing to the underlying construct of disability, all items were assessed for individual fit to the Rasch model (non-significant at Bonferroni-adjusted chi-square p-value, standardised (z-score) fit-residuals within ± 2.5). Additionally, the overall chi-square p-value is reported for the scale, which provides an indication of scale fit. This reflects how well the items in the PROM effectively measure the same underlying construct, and whether the responses are consistent with the expectations of the model [20]. Response category functioning was assessed to ensure that item response categories were operating as intended. The locations of the crossover points between adjacent response categories (the thresholds) should logically increase across the underlying trait, in order to represent distinctly separate response categories. It is acknowledged that Rasch fit statistics within RUMM2030 can become overpowered by large sample sizes [21]. Therefore, for the tests of item fit, scale fit, differential item functioning and unidimensionality, a random sample of n = 600 was taken in order to aid interpretation and limit type II error. Although a minimum of 500 participants is suggested, we opted for a slightly larger sample to account for the presence of extreme values (i.e. participants scoring at the floor or ceiling of the scale). The complete sample was used for all other fit indices.
Unidimensionality
When generating a total score from a set of items, it is necessary that all items measure the same underlying construct. This scale characteristic is known as unidimensionality and this is an assumption of the Rasch model. In order to test unidimensionality within the Rasch analysis, the item loadings of a residual principal components analysis are used to identify the two item sets within a scale that diverge the most. These two independent sets of items are then used to derive two separate person estimates, which should be equivalent if the notion of unidimensionality holds. A series of paired t-tests is then used to compare the separate person estimates for each individual and unidimensionality is evident when the person estimates do not differ in more than 5% of cases, with a lower bound 95% confidence interval applied [20, 22].
Local independence
Local independence means that responses to items are not correlated beyond what is explained by the underlying construct that is being measured, ensuring that each item provides independent information. All items were tested for local dependency, where the residual correlation Q3 criterion cut point to indicate dependency was taken as 0.2 above the average residual correlation [23].
Cross cultural validity
Differential Item Functioning (DIF) is a form of measurement bias which occurs when an item operates differently between pre-defined groups of individuals, conditional on the level of the construct being held steady. DIF was assessed for sex (male/female), age group (0–52/53 +), defined by the median value and, DIF by geographic region (Bangladesh/Northern-Europe/Southern-Europe/North America/Asia/Other) providing a measure of cross-cultural validity.
Internal consistency
Internal consistency refers to the interrelatedness amongst items within an instrument and when it is sufficiently high, it indicates that all the items of the instrument are measuring the same underlying construct consistently. Internal consistency reliability indices were taken as the person separation index (PSI) and Cronbach’s alpha.
Construct validity
We evaluated construct validity by comparing the I-RODS scores to frequently used clinical outcome measures for GBS, i.e. the GBS-DS [24, 25], MRC sum score [26], Overall Neuropathy Limitations Scale (ONLS) [27], and a PROM EQ-5D-5L [28]. We hypothesise a moderate to strong negative correlation between the I-RODS and the GBS-DS with Spearman’s correlation coefficient (ρ) expected to be lower than −0.6. We expect a moderate positive correlation between the I-RODS and the MRC sum score with Spearman's correlation coefficient anticipated to be greater than 0.5. Furthermore, we anticipate a strong negative correlation between the I-RODS and the ONLS with Spearman's correlation coefficient expected to be lower than −0.7. For the EQ-5D-5L domains mobility and self-care we expect a moderate to strong negative correlation with the I-RODS, with a Spearman's correlation coefficient of lower than −0.6. For the domain usual activities, we expect a strong negative correlation with a Spearman's correlation coefficient of lower than −0.7.
Floor and ceiling effects
Percentages of patients with the lowest and highest possible scores were calculated to check for potential floor or ceiling effects. The recommended limit at either end is 15% [29]. The presence of these effects suggests that the PROM may be limited in differentiating between individuals with different levels of the construct. This leads to limitations in detecting improvements and deterioration within individual patients. Floor and ceiling effects indicate a compromised measurement range, where the measurement range of the instrument does not cover the relative location range of the target population. Scale targeting was also assessed visually to provide an indication of the relative distribution of item and person locations.
Results
Study population
From the IGOS 2000 cohort we excluded 107 (5%) patients with an alternative diagnosis, 43 (2%) patients with a protocol violation, 8 (0.4%) patients with insufficient data, and 616 (31%) patients who did not complete the I-RODS at week 4. These excluded patients without I-RODS did not differ from the included patients with respect to demographical (age: p = 0.72 and sex: p = 0.59) and clinical characteristics, with the exception of the GBS-DS (p < 0.001), which was slightly lower (corresponding with slightly less severe disability) in included patients (median 2; IQR 1–4) than in excluded patients (median 3; IQR 1–4).The remaining group of 1226 patients were eligible for the current study (Fig. 1). The median age was 51 years (IQR 35–64 years), with a male to female ratio of 1.5. The median I-RODS score was 28 (IQR 10–41) at week 4. Sample demographics are presented in Table 2.Fig. 1. Flow diagram study populationTable 2Patient characteristicsI-RODSOverallBangladeshAsia^1^Northern^2^ EuropeSouthern^3^ EuropeNorth^4^ AmericaOther^5^Demographics N122618918939826012565 Age in years, median (IQR)51 (35–64)30 (18–45)52 (39.5–64.5)56 (41–67)56 (41–68)53 (34.25–63)46 (30.5–60.5)Sex (%) Male64.064.057.758.860.056.864.6 Female36.036.042.341.240.043.235.4Clinical featureGBS disability score (%) 05.00.511.13.55.85.64.6 122.39.534.915.126.930.432.3 229.427.029.633.230.421.624.6 316.218.06.919.813.520.816.9 421.934.414.321.621.516.820.0 55.110.63.26.01.94.81.5MRC sum score, median (IQR)55 (42–60)36 (22–48)59 (48–60)56 (47–60)58 (48–60)59 (52–60)54 (47–60)ONLS, mean (SD)4.7 (3.7)6.9 (3.7)3.4 (3.5)5.0 (3.4)4.0 (3.5)4.3 (3.6)4.0 (3.4)EQ-5D-5L, median (IQR) Mobility2 (0–4)3 (1–4)1 (0–2)2 (1–4)1 (0–3)1 (0–3)1 (0–3) Selfcare1 (0–3)3 (1–4)0 (0–2)1 (0–2)1 (0–2)0 (0–2)0 (0–2) Usual activities2 (1–4)3 (1–4)1 (0–2)3 (1–4)1 (0–3)2 (1–3)0 (1–3)GBS forms and variants (%) Motorsensory form57.721.245.070.167.770.460.0 Motor form24.776.727.511.315.45.621.5 MFS and MFS- GBS overlap12.21.620.613.110.016.813.8 Other^6^3.40.56.43.15.54.83.0Treatment (%) No treatment19.074.66.312.64.27.215.4 IVIg67.27.958.283.780.084.880.0 Plasma exchange12.511.635.43.015.88.01.5 Other1.35.8–0.8––3.1I-RODS Inflammatory Rasch overall disability scale, IQR Inter quartile range, MFS Miller fisher syndrome, IVIg Immunoglobulins, ONLS Overall neuropathy limitations scale, MRC Sum score medical research counsil sum score, EQ-5D-5L EuroQol 5 dimensions 5 levels^1^Asia: Japan, Malaysia, Taiwan and China (Bangladesh excluded because of low socio-economic status)Northern Europe^2^: Belgium, Denmark, Germany, Netherlands, United KingdomSouthern Europe^3^: France, Greece, Italy, Spain, SwitzerlandNorth America^4^: Canada and the United States of America^5^Other countries: Argentina, Australia and South Africa^6^Other clinical variants: pharyngo-cervical-brachial, pure sensory, ataxic or other variant
Structural validity
Confirmatory factor analysis
The fit of a single-factor model showed a chi-square value of 1049 (df 252), p < 0.001. The RMSEA = 0.069 (95% CI = 0.065–0.073); CFI = 0.998; TLI = 0.997; SRMR = 0.038; GFI = 0.999 (Table 1). These results are somewhat contradictory, as the chi-square result suggests a significant misfit from the single-factor model, but all of the other confirmatory factor analysis indices suggest a good fit to the single-factor model. The hypothesis that all items would fit a single-factor model can therefore not be fully accepted or rejected. However, the chi-square fit statistic may be sensitive to the large sample size, and the supplementary fit indices indicate that the items could be taken forward into a Rasch analysis.
Rasch analysis
A summary of the overall fit indices is provided for both the full sample and the n = 600 subsample (Table 3). Additionally, a complete table of the individual item fit is provided in Table 4 and Supplementary Table 3, and pairwise local dependency is provided in Supplementary Tables 4 and 5. The overall fit of the responses on the I-RODS deviates significantly from the Rasch model based on the chi-square fit statistic, reflecting a poor overall fit. A good person fit was observed, with fewer than 1% (8/1224) of the individuals falling outside of the accepted fit residual range. Misfit anomalies were observed for 6 items: “Are you able to catch an object (e.g. a ball)”, “Are you able to wash your lower body”, “Are you able to take a shower”, “Are you able to brush your teeth” and “Are you able to read a book/newspaper”. The items “Are you able to read a book/newspaper” and “Are you able to catch an object (e.g. a ball)” demonstrated the largest under-discrimination anomalies, indicating that these items failed to accurately discriminate between different disability levels. This suggests that these two items potentially measure a different construct compared to the other items within the scale. All response categories operated as intended, with thresholds for all items demonstrating appropriate threshold ordering (Fig. 2).Table 3. Fit of the Rasch modelFull sampleSubsampleTarget valuestotal n- (extremes) = valid n1226–(248) = 978600–(116) = 484-Number of items2424-Overall scale fit (Chi-square p)p < 0.001**p < 0.001p > 0.01Individual item fit residuals6/24 misfitting6/24 misfittingwithin ± 2.5Individual item fit (Chi-square p)7/24 misfitting2/24 misfittingp > 0.05 (Bonferroni adj)Person Fit Residuals < 1% outside range < 1% outside rangewithin ± 3.0Unidimensionality12.13% (CI 10.7–13.5%)14.46% (CI 12.5–16.4%)% of significant t-tests < 5%Local dependency15/276 pairwise significant20/276 pairwise significant < 0.2 above average Q3TargetingSatisfactorySatisfactoryaligned% sample at floor (min score)10.1%8.3%-% sample at ceiling (max score)9.4%11.0%-% sample at floor and ceiling (combined)19.5%19.3% < 15%Person separation index (PSI)0.950.95 > 0.85Response category threshold orderingAll orderedAll orderedAll orderedDIF-by-sexNo DIFNo DIFp > 0.05 (Bonferroni adj)DIF-by-age group1/24 Uniform; 2/24 Non-uniformNo DIFp > 0.05 (Bonferroni adj)DIF-by-ethnicity15/24 Uniform; 5/24 Non-uniform11/24 Uniform; 3/24 Non-uniformp > 0.05 (Bonferroni adj)DIF Differental item functioning^^Significant values p < 0.05Table 4Individual item fit for full sample (n = 1226)ItemStatementLocationSEFit residualProbability1Bend forward and pick something up0.2560.069− 2.2360.0763952Remain standing for a long time period, e.g. several hours2.0330.0690.640.0279073Walk up a flight of stairs1.4500.07− 2.2350.0059424Run3.9070.082− 1.1690.3169085Walk outdoors, up to max 1 kilometre1.6650.069− 1.3590.0814326Walk while avoiding obstacles0.4650.068− 2.1860.2334637Dance2.8280.084− 1.390.3965098Travel by public transport1.2130.073− 1.7170.000581^+^9Turn a key in a lock− 0.9930.0732.0270.00329610Carry and put down a heavy object1.5890.07− 0.8140.65451111Move a chair− 0.5240.071− 2.110.005912**12Catch an object (e.g. a ball)− **0.9360.0744.483****0^+^13Wash your upper body− 1.7460.0772.7390.01097614Wash your lower body− 0.7120.0713.5190.00766115Take a shower− 0.5740.074.6970.000156^+^16Brush your teeth− 2.6930.0852.2570.001171^+^17Sit on/go to a toilet− 1.2080.073− 0.130.05855918Dress your upper body− 1.7360.076− 1.9050.00377719Eat− 3.0890.0891.3730.77780320Do the dishes− 0.0150.079− 1.9640.000052 + 21Make a sandwich− 0.9140.082− 0.0350.04354522Do the shopping1.2920.0734.2760.000065^+^23Go to the general practitioner0.1020.073− 1.10.00290424****Read a book/newspaper− 1.658****0.0768.9150^+^Fit residuals within ±2.5 indicate acceptable fitSE* Standard errorMisfit based on fit residuals are marked (*), misfit based on chi-square (p <.05) are marked (^+^), and the item with the largest misfit is shown in boldFig. 2Threshold map for I-RODS in IGOS study cohort
Unidimensionality
The series of t-tests demonstrated that 12.1% (CI 10.7–13.5%) of the full sample and 14.5% (CI 12.5–16.4%) of the subsample report significantly different person estimates. This is more than the 5% that would be expected naturally and is suggestive that the I-RODS is not a unidimensional measurement instrument measuring solely limitations in activity and social participation.
Local independence
Lack of local independency was identified within the item set, with 15 of the 276 (5.4%) pairwise correlations suggesting dependencies. The largest of these dependencies was found between item “Are you able to wash your upper body” and “Are you able to wash your lower body”.
Cross cultural validity
There was no substantive DIF observed for sex or age group. However, significant DIF was evident for geographic region, with 11 of the 24 items displaying systematic uniform differences in item functioning. A DIF summary table is provided in Supplementary Table 6, along with the DIF-by-geographic region plots for the individual items in Supplementary Figs. 1– 24. The item “Are you able to remain standing for a long time period” was the item displaying the largest degree of DIF by geographic region, and this is presented in Fig. 3, where it can be seen that this is considered to be an ‘easier’ item in Asia than it is in Northern Europe. In order to provide a comparison between the different geographical regions, the overall expected values for each geographical region are presented for each item in Supplementary Table 7.Fig. 3DIF by region for item 2 “Remain standing for a long time period”. *Person location refers to the level of disability of each patient with mean=0 and SD=1 and the expected value refers to the expected item response based on the level of disability
Internal consistency
The I-RODS demonstrated sufficient internal consistency, with a Person Separation Index (PSI) of 0.95 with extremes and 0.96 without extremes and a Cronbach’s alpha of 0.98.
Construct validity
The expected and observed correlations between the I-RODS and other measures are presented in Table 5. As anticipated, a strong negative correlation was observed between the I-RODS and GBS-DS, indicating that lower scores on the I-RODS are associated with higher scores on the GBS-DS. A strong positive correlation was found between the MRC sum score and the I-RODS, with higher scores on the I-RODS associated with higher scores on the MRC sum score. A strong negative correlation was observed between the I-RODS and the ONLS, where higher scores on the I-RODS scores were associated with lower scores on the ONLS. A strong negative correlation was also found between the I-RODS and the three EQ-5D dimensions. Higher scores on the I-RODS were linked to lower scores on the mobility, usual activities, and self-care dimensions of the EQ-5D.Table 5. Expected and observed correlationsOutcome measureI-RODSExpected correlationObserved correlationGBS-DS ≤ −0.60− 0.91MRC sum score ≥ 0.500.77ONLS ≤ −0.70− 0.90EQ-5D-5L Mobility ≤ −0.60− 0.86EQ-5D-5L Self care ≤ −0.60− 0.86EQ-5D-5L Usual activities ≤ −0.70− 0.81I-RODS Inflammatory Rasch-built overall disability scale, GBS-DS GBS disability scale, ONLS Overall neuropathy limitations scale, MRC sum score Medical research counsil sum score, EQ-5D-5L EuroQol 5 dimensions 5 levels
Floor and ceiling effects
The I-RODS revealed that 10.1% of the patients obtained the lowest possible score and 9.4% of the patients obtained the highest possible score (Fig. 4). This is reflected in the distribution of patients that is somewhat spread out and evenly distributed, whereas we would expect a normal distribution. The person-item threshold distribution is presented in Fig. 4.Fig. 4. Person-item threshold distribution. *Person-Item Threshold Distribution depicts the distribution of item location (disability levels of the patients) relative to the distribution of item location (the disability levels represented by the scale of the items. The x-axis represents the continuum of disability with a frequency distribution of the GBS patient locations plotted above the x-axis and a frequency distribution of the I-RODS item locations plotted below the x-axis
Discussion
This study evaluated the measurement properties of the I-RODS, a widely utilised PROM for patients with GBS in both clinical practice and research settings. Using Rasch methodology, the analysis of an extensive international patient cohort revealed adequate internal consistency, acceptable targeting, and good construct validity. However, there are several notable limitations in the current version of the I-RODS.
First, I-RODS demonstrated a lack of unidimensionality. This suggests that multiple constructs instead of a single construct were measured by I-RODS and that the current practice of summing up scores on the items into one total score is inappropriate. The apparent lack of unidimensionality could also be caused by the high degree of local dependency that was found amongst the I-RODS items, which was observed across multiple pairwise correlations. This high level of local dependency could also be artificially inflating the high internal consistency reliability values (PSI = 0.95; Cronbach’s alpha = 0.98). In addition, these findings to some extent also could explain the inconclusive confirmatory factor analysis results, where the single-factor model was rejected by the Chi-square test, but all of the supplementary fit indices indicated a good fit. Taken together, these results indicate that the assumed unidimensional structure does not hold and that scores should be interpreted with caution, as they may not accurately reflect limitations in daily activity and social participation.
Second, individual item misfit was present in six items, with two items (“Are you able to read a book/newspaper” and “Are you able to catch an object (e.g. a ball)” displaying problematic under-discrimination. These two items should be reviewed by experts based on their content for revision or possible removal, as they may not be appropriate for GBS.
Third, although there was no apparent floor and ceiling effect, as the percentage at each end falls below 15% and the targeting of the I-RODS to patients with GBS was considered acceptable, there was a combined total of 19.5% of patients scoring at the floor or ceiling. This reflects a limitation of the scale because it fails to capture 19.5% of the GBS population (within our sample), which is approximately one in five patients. This tendency towards the floor and ceiling could be the result of the absence of content validity of the I-RODS or that the I-RODS does not fully capture the disability range found within GBS patients [30]. Reviewing the relevance, comprehensiveness, and comprehensibility of each item by patients with GBS and the inclusion of easier as well as more challenging items could improve the targeting of the questionnaire and minimise the proportion at the floor and ceiling. Lastly, DIF was detected for geographical region but not for age and sex. Patients from different regions vary significantly in the way that items operate, probably attributable to cultural differences. For example, the largest DIF was found for the item “Remain standing for a long time period”, with the biggest difference between regions noted between Asia and Northern Europe. This could be due to the interpretation of what is considered to be a ‘long time,’ or it could reflect the relative difference in the amount of time spent standing in the different regions. That the Asian region considered this item to be ‘easier’ suggests that standing for a long period is easier for respondents from Asia than it is for respondents from Northern Europe. The presence of DIF for region can limit the comparability of scores across different languages or regions, which can be problematic given that the I-RODS is used globally. Our findings regarding the cross-cultural validity of the I-RODS are in line with previous research [11]. However, the current study extends on previous analyses by including patients from Asia and a larger number of patients from other geographic regions.
The existing literature on the validation of the I-RODS is very limited. Only a few studies have assessed its responsiveness and construct validity, with the quality of the methods of the studies on responsiveness being suboptimal; for example, the small sample of 55 patients was insufficient to adequately assess responsiveness [10, 14, 15]. For the construct validity of the I-RODS, the predefined hypotheses on the correlation between the I-RODS and GBS-DS, MRC sum score, ONLS, and EQ-5D-5L mobility, usual activities, and self-care were confirmed with strong correlations (−0.91–0.77). Construct validity was assessed in previous studies by examining the correlations between the I-RODS and the EQ-5D and a performance-based test (vigori-dynamometer) [10, 15]. However, no predefined hypotheses were formulated, which is currently a requirement for the evaluation of construct validity according to the COSMIN guideline [30]. In a recent systematic review on the measurement properties of PROMs for patients with polyneuropathies, we formulated hypotheses for testing the construct validity of the I-RODS and other PROMs, and the results of the studies mentioned above were not in line with these hypotheses [31]. Reasons for this inconsistency are unclear. Moreover, the involvement of patients in the development of the I-RODS was indirect, as they reviewed items for another questionnaire in a prior study, from which some items were later incorporated into the I-RODS [32]. Patients completed the preliminary version of the I-RODS. However, they were not involved in a concept elicitation phase, which is considered a crucial step for establishing content validity according to the COSMIN guideline [19, 33]. The importance of patient involvement and qualitative studies in establishing adequate content validity is also emphasised by the US Food and Drug Administration (FDA) [9]. This scarcity and methodological limitations of research on the measurement properties of the I-RODS underscore the importance of the current findings and the implications for the use of the current I-RODS scale in future clinical trials within a GBS sample.
The differences in findings between this study and the development study may be attributed to several factors. First, this study included a larger, multinational sample consisting solely of GBS patients, whereas the development study involved a relatively small Dutch cohort with a mixed patient population (GBS, chronic inflammatory demyelinating polyneuropathy, and gammopathy-related polyneuropathy). Second, the development study assessed clinically stable patients and in a later stage of the disease, whilst this study used data collected at the four-week timepoint, when the condition of patients was likely to be more severe. Third, the advancement of analytical methodology, such as for example, the assessment of local dependency, which is now conducted with greater rigour than at the time the I-RODS was developed.
Considering the evaluation of the measurement properties of the I-RODS, our results suggest a need for a more psychometrically valid PROM in the context of assessing disability in patients with GBS. A possible alternative could be the use of high-quality generic PROMs from the Patient-Reported Outcomes Measurement Information System (PROMIS) [34]. We recommend using multiple domains of PROMIS that are considered to be relevant for patients with GBS, such as fatigue, physical function, and ability to participate in social roles and activities.
The high-quality nature of the data, achieved through a predefined study protocol, prospective collection, and regular data quality controls during the study conduct, is a key strength of this validation study. The international cohort, including patients from both high- and low-income countries, contributed to the robustness of the study and allowed for the analysis of cross-cultural validity. Furthermore, the substantial sample size provided an adequate reflection of the GBS population. The availability of multiple outcome measures in IGOS allowed us to evaluate construct validity by comparisons with various other relevant outcome measures. A limitation of the study is the focus on a single time point rather than multiple time points. Test–retest reliability and responsiveness were beyond the scope of this study. Future studies should consider adopting a longitudinal approach with high-quality methods. Another limitation is that, in Bangladesh, the questionnaires were administered verbally by the physicians, which may have introduced interviewer bias and differences in interpretation of the items. Nonetheless, this approach addresses literacy-related barriers and facilitates broader participation and representativeness of the study population. A final limitation is the relative underrepresentation of the most severely affected patients within this cohort. This can be reasonably explained by the difficulty these patients experience in completing the I-RODS in the acute phase of GBS. However, it is important to note that the primary aim of this study is to assess the measurement properties of the PROM, rather than to provide a representative depiction of GBS disease burden. In conclusion, this study found evidence for adequate internal consistency, acceptable targeting, and good construct validity but insufficient structural validity and cross-cultural validity. This means that the scores on the I-RODS may not accurately reflect one construct, namely, limitations in daily activities and social participation. Some items seem closely related to each other in a way that suggests the PROM contains a degree of dependency or redundancy. Additionally, some items might be picking up on another construct beyond limitations in daily activity and social participation. Moreover, it might not perform consistently across different cultures or languages, and the difficulty of the items doesn’t always match the level of disability observed in this patient sample. We therefore propose to revise and improve the current version of the I-RODS, use another PROM, or develop a new PROM focussed on various relevant domains of quality of life relevant for GBS patients.
Supplementary Information
Below is the link to the electronic supplementary material.Supplementary file1 (DOCX 367 KB)
