Properties of Measurements of Patient‐Reported Outcome Measures Physical Activity Assessment in Chronic Kidney Disease: A Systematic Review
Thamiê Cristina Stella, Inaê Silva Santos, Graziella Alves da Silva, Cid André Gomes, Luciana Maria Malosá Sampaio

TL;DR
This systematic review evaluates physical activity questionnaires for chronic kidney disease patients, finding limited consensus and highlighting the CKD-PAQ as a promising tool.
Contribution
Identifies psychometric properties of PROMs for physical activity in CKD and highlights the CKD-PAQ as the only tool with favorable evaluations.
Findings
Only the CKD-PAQ showed satisfactory results in both COSMIN RoB and EMPRO evaluations.
There is no consensus on the best PROM for assessing physical activity in CKD.
Few studies focus on early-stage CKD or evaluate responsiveness of PROMs.
Abstract
The assessment of physical activity (PA) using Patient‐Reported Outcome Measures (PROMs) is routine due to its easy applicability and low cost. In the context of CKD, there are several studies that evaluate PA through PROM, but there is heterogeneity in the choice of tool. Therefore, the objective of this study was to identify, evaluate and synthesize the psychometric properties of PROMs used in the assessment of PA in CKD. A systematic review was carried out according to the PRISMA guidelines in PubMed, SciELO, Medline, Lilacs and EMBASE databases. The assessment of the quality of the studies was evaluated using the COSMIN Checklist and EMPRO tool. 15,137 studies were found, with 17 included. Eleven PROMs were found, of which 2 were elaborate to the CKD population: Chronic Kidney Disease Physical Activity Questionnaire (CKD‐PAQ) and Low Physical Activity Questionnaire (LoPAQ).…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1| Study | Study design | Population |
| PROM(s) | Psychometric properties | Comparative test | Statistics | Cutoff points | Outcomes |
|---|---|---|---|---|---|---|---|---|---|
| Hadjiioannou et al. ( | Controlled, non‐randomized, single‐center pragmatic clinical trial |
Non‐dialysis and dialysis CKD Renal transplant (excluded from this evaluation) |
40 individuals (total), but 12 transplanted. Only 28 individuals evaluated in this review Age (mean): 56.7 years Gender: 20 males Sample wasn't calculated | DASI |
Absolute Test‐retest | NA |
| ND |
SEM = 3.95 MCID = 9.21 ICC = 0.9
SEM = 3.0 MCID = 8.57 ICC = 0.87 |
| Hatef et al. ( | Cross‐sectional observational study | CKD dialysis (HD) |
404 individuals, among which 202 selected for exploratory statistics and 202 selected for confirmatory statistics Age (mean): 56.8 ± 12.04 years Gender: 58% male Sample was calculated | ESES (Persian language) |
Face Content Construct (structural) Convergent Construct Internal consistency | NA |
|
AGFI > 0.8
|
AGFI = 0.76 RMSEA = 0.06
|
| Huang et al. ( | Multicenter cross‐sectional observational study | CKD dialysis (HD) |
85 individuals, among which 29 took the test twice for test‐retest evaluation Age (mean): 62.3 ± 11.8 years Gender: 48 males Sample wasn't calculated |
Subdivided in six categories: Sitting, walking, light activity, moderate activity, vigorous activity and total activity |
Content Criterion Internal consistency Test‐retest | Actigraphy by |
|
Good: 0.60—0.74 Fair: 0.40—0.59 Poor < 0.40 |
Sitting = 0.22 Walking = 0.47 Total activity = 0.44
Sitting = 0.43 Walking = 0.30 Light activity = 0.49 Moderate activity = 0.59 Vigorous activity = 0.66 Total activity = 0.47
|
| Johansen et al. ( | Cross‐sectional and multicenter observational study | CKD dialysis (HD) |
39 individuals, among which 29 took the test twice for test‐retest evaluation Age (mean): 52 ± 16 years Gender: 26 males Sample wasn't calculated |
Subdivided in two scores: MAS (maximum activity score) and AAS (adjusted activity score)
|
Criterion validity Test‐retest reliability |
Accelerometer Physical performance through assessment of time for activities |
Test‐retest reliability: Not specified | ND |
Accelerometer correlation = 0.78 Performance assessment correlation = 0.72
Accelerometer correlation = 0.,73 Performance assessment correlation = 0.75 Test‐retest reliability = 0.91
Accelerometer correlation = 0.66* Performance assessment correlation = 0.50*
Accelerometer correlation = 0.59 Performance assessment correlation = 0.36 |
| Johansen et al. ( | Cross‐sectional and multicenter observational study | CKD dialysis (HD) |
68 individuals Age (mean): 59.0 ± 14 years Gender: 59% male Sample wasn't calculated |
Subdivided in walking and total activity |
Construct (hypothesis testing) Convergent |
Short form (36) health survey (SF‐36), physical component Short physical performance battery (SPPB) Minnesota leisure time activities questionnaire (LTA) | Spearman's correlation | ND |
Walking = 0.58 Total activity = 0.62
|
| Kittiskulnam et al. ( | Cross‐sectional and multicenter observational study | CKD dialysis (HD and PD) |
60 individuals (48 on HD and 12 on PD) Age (mean): 58,0 ± 12,7 years Gender: 78,3% male Sample wasn't calculated |
Subdivided in walking and total activity |
Construct (hypothesis testing) Criterion |
Short form (36) health survey (SF‐36), physical component Short physical performance battery (SPPB) Pedometer | Pearson's correlation | ND |
Total activity = 0.35
No statistical difference between dialysis modalities |
| Lou & He, ( | Multicenter cross‐sectional observational study | CKD dialysis (HD) |
320 individuals, among which 103 took the test twice for test‐retest evaluation. Age (mean): 58.6 years Gender: 200 males Sample wasn't calculated |
Subdivided in four categories: Walking, moderate activity, vigorous activity and total activity |
Criterion validity Test‐retest reliability | Pedometer OMRON HJ‐328 utilized for 7 days |
|
Moderate: 0.5 Small: 0.2
|
Walking = 0.407 Moderate activity = 0.43 Vigorous activity = 0.28 Total activity = 0.561
Walking = 0.90 Moderate activity = 0.62 Vigorous activity = 0.99 Total activity = 0.84 |
| Overend et al. ( | Observational test‐retest and single‐center study | CKD dialysis (HD) |
25 individuals Age (mean): 67.2 years Gender: 14 males Sample was calculated |
|
Absolute Test‐retest | NA |
|
|
SEM = 5.5 MCID = 15.1 ICC = 0.76
SEM = 4.1 MCID = 11.4 ICC = 0.92 |
| Ravani et al. ( | Cross‐sectional, single‐center observational study | Non‐dialysis CKD (stages 3 and 4) |
43 individuals, among wich 23 for reliability assessment and 20 for validity assessment. Ages (mean): 58.0 ± 12.7 years Gender: 78.3% male Sample was calculated | DASI |
Test‐retest reliability Criterion validity | Cardiopulmonary exercise test (CPET) |
| ND |
CKD stage 3 = 0.71 CKD stage 4 = 0.81
|
| Robinson‐Cohen et al. ( | Prospective single‐center cohort study | Non‐dialysis CKD |
46 individuals Age (mean): 55 ± 11 years Gender: 54% male Sample wasn't calculated |
| Criterion validity | Actigraphy by | Pearson's correlation | ND |
|
| Rosa et al. ( | Multicenter cross‐sectional observational study | CKD dialysis (HD) |
40 individuals Age (mean): 54 ± 16 years Gender: 21 females Sample wasn't calculated |
Subdivided in light activity, moderate to vigorous activity and total activity | Criterion validity | Actigraphy by | Spearman's correlation | ND |
Light activity = 0.34 Moderate to vigorous activity = 0.38 Total activity = 0.41 |
| Sridharan et al. ( | Prospective cross‐sectional study | Non‐dialysis CKD |
40 individuals, among which 21 from stages 1–3 and 19 from stages 4–5. Age (mean): 54.1 ± 17.4 years Gender: 22 males Sample wasn't calculated |
| Criterion validity | Total energy expenditure (TEE) from the double labeled water test. | Bland‐altman test | ND |
RPAQ = 0.57 PAR = 0.23 |
| Sridharan et al. ( | Longitudinal cohort study | CKD all stages |
266 individuals, among which 40 for qualitative assessment Age (mean): 58.3 ± 15.1 years Gender: 57.6% males
523 individuals Age (mean): 60.8 ± 16.1 years Gender: 63.7% male Sample wasn't calculated |
MET value TEE value |
Construct (hypothesis testing) in development phase Convergent in final phase |
Accelerometer RPAQ questionnaire | ICC and bland‐altman test | ND |
ICC (accelerometer): MET value = 0.45 TEE value = 0.75 ICC (RPAQ): MET value = 0.90 TEE value = 0.91
ICC (RPAQ): MET value = 0.83 TEE value = 0.92 |
| Tabib et al. ( | Cross‐sectional study | CKD dialysis (HD) |
109 individuals Age (mean): 64 ± 11 years Gender: 56% male Sample wasn't calculated | LoPAQ (Persian language) |
Content Convergent
|
Community healthy adults model program for seniors (CHAMPS) questionnaire Short form (36) health survey (SF‐36), physical component Short physical performance battery (SPPB) |
ICC |
Good 0.60–0.74 Fair 0.40–59 Poor < 0.40 |
|
| Wellard ( | Cross‐sectional observational study | CKD dialysis (HD) |
65 individuals Age (mean): 62 ± 13.4 years Gender: 47 males Sample wasn't calculated |
| Convergent validity | Sickness impact profile (SIP) questionnaire | Spearman's correlation | ND |
|
| Wilkinson et al. | Secondary analysis of the research group's database of previous studies | CKD all stages |
40 individuals, among which 11individuals for test‐retest reliability assessment Age (mean): 62.5 ± 11.1 years Gender: 21 females Convenience sample | GPPAQ |
Criterion validity Test‐retest reliability | Accelerometer GENEActiv for 7 days |
Positive predictive value (PPV) and negative predictive value (NPV)
| ND |
Sensitivity = 54.5% Specificity = 96.6% PPV = 85.7% NPV = 84.9% Accuracy = 85% Kw = 0.74 |
| Yamabe et al. ( | Cross‐sectional observational study | CKD dialysis (HD) |
220 individuals Age (mean): 67.8 ± 11.6 years Gender: 59.1% male Sample wasn't calculated |
Subdivided in sitting, walking and total activity |
Criterion validity Cross‐cultural adaptation from Japanese language | Pedometer |
|
Small: 0.2 |
Sitting = −0.14 Walking = 0.53 Total activity = 0.49 |
| PROM | Psychometric properties | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Construct validity (structural) | Construct validity (hypotheses testing) | Content validity | Criterion validity | Convergent validity | Cross‐cultural validity | Internal consistency | Reliability | Measurement error | |
|
| — |
(+) Moderate | — | — |
(+) Moderate | — | — | — | — |
|
| — | — | — |
(−) Low | — | — | — |
(+) Moderate |
(+) Low |
|
|
(−) High | — |
(?) Low | — |
(−) High |
(?) Moderate |
(+) High |
(+) Low | — |
|
| — | — | — |
(−) Low | — | — | — | — | — |
|
| — | — | — |
(±) Low | — | — | — |
(+) Low | — |
|
| — | — | — |
(±) Moderate |
(+) Moderate | — | — |
(+) Low |
(+) Low |
|
| — | — | — |
(−) High | — | — | — |
(+) Moderate | — |
|
| — |
(−) High |
(+) Low |
(±) High |
(−) High |
(?) Very low |
(?) Very low |
(±) Moderate | — |
|
| — | — | — |
(−) Moderate | — | — | — | — | — |
|
| — | — | — |
(−) Moderate | — | — | — | — | — |
|
| — | — | — |
(−) Low | — | — | — | — | — |
| PROM | EMPRO attributes | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Conceptual and measurement | Reliability (internal consistency) | Reliability (reproducibility) | Validity | Responsiveness | Interpretability | Burden (respondent) | Burden (administrative) | Alternative modes of administration | Cultural and language adaptations | |
|
|
| — | — |
| — |
| — | — | — | — |
|
| — | 33.3 |
| 22.2 | — |
| — | — | — | — |
|
|
|
| 16.7 | 38.9 | — | — | 22.2 | 22.2 | — |
|
|
| — | — | — | 22.2 | — | 44.4 | — | — | — | — |
|
| — | NA |
|
| — |
| — | — | — | — |
|
| — | 16.6 |
| 12.9 | — |
| — | — | — | — |
|
| — | NA |
| 12.9 | — | 44.4 | — | — | — | — |
|
|
| 8.3 | 29.1 | 39.9 | — |
| — | — | — |
|
|
| — | NA | 33.3 | 13.9 | — |
| — | — | — | — |
|
| — | NA | 33.3 | 13.9 | — | 49.9 | — | — | — | — |
|
| — | — | — | 22.2 | — |
| — | — | — | — |
| PROM | Assessment tools | Psychometric properties | |||||
|---|---|---|---|---|---|---|---|
| Desing and concept | Validity | Reliability | Responsiveness | Interpretability | Cross‐cultural validity | ||
|
|
COSMIN RoB EMPRO |
|
| — | — |
—
| — |
|
|
COSMIN RoB EMPRO | — |
|
| — |
—
| — |
|
|
COSMIN RoB EMPRO |
|
|
| — | — |
|
|
|
COSMIN RoB EMPRO | — |
| — | — |
—
| — |
|
|
COSMIN RoB EMPRO | — |
|
| — |
—
| — |
|
|
COSMIN RoB EMPRO | — |
|
| — |
—
| — |
|
|
COSMIN RoB EMPRO | — |
|
| — |
—
| — |
|
|
COSMIN RoB EMPRO |
|
|
| — |
—
|
|
|
|
COSMIN RoB EMPRO | — |
| — | — |
—
| — |
|
|
COSMIN RoB EMPRO | — |
| — | — |
—
| — |
|
|
COSMIN RoB EMPRO | — |
| — | — |
—
| — |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDialysis and Renal Disease Management · Physical Activity and Health · Peripheral Artery Disease Management
Introduction
1
According to World Health Organization (WHO), physical activity (PA) is an important factor in the control and protection of chronic diseases (Bull et al. 2020). Chronic Kidney Disease (CKD) causes systemic alterations that impact the individual's functionality (Roshanravan et al. 2017), and the reduction in PA is proportional to renal function (Clarke et al. 2018). The disease is divided into six functional stages, where stage 1 is the mildest and stage 5 is the terminal phase requiring dialysis, and characterizing the most inactive part of CKD (KDIGO. Kidney Disease Improve Global Outcomes; Roshanravan et al. 2017). Stimulating PA in CKD brings benefits, such as increased survival, quality of life, and better control of comorbidities (Wilkinson et al. 2025).
Assessment of PA in CKD is particularly challenging. The gold standard method of measurement energy expenditure using doubly labeled water (DLW) is compromised in CKD due to impaired renal function in filtering isotopes (Sridharan et al. 2022). However, instruments objective PA assessment by wearable devices (e.g. accelerometers, pedometers and actigraphy) does not show good accuracy in distinguishing different levels of PA in predominantly inactive populations, such as in CKD (Sylvia et al. 2014).
The use of Patient‐Reported Outcome Measures (PROMs) for PA evaluation is routine due to its easy applicability and low cost (Sattler et al. 2021). In the CKD context, there are two specific PROMs published: the Low Physical Activity Questionnaire (LoPAQ) and the Chronic Kidney Disease Physical Activity Questionnaire (CKD‐PAQ). LoPAQ was created to discriminate PA levels in individuals on dialysis, the mostly sedentary group. However, it is limited by only covering the terminal phase of CKD (Johansen et al. 2015). In turn, CKD‐PAQ includes all stages of CKD (Sridharan et al. 2022), but it is a new PROM with no literature about applicability yet. For these reasons, generalist PROMs are still used in CKD, with heterogeneity in choice (Battaglia et al. 2024; Bakker et al. 2021).
Although the use of PROMs to assess PA in CKD is stimulating (Battaglia et al. 2024), the lack of uniformity in choice of PROM may compromise the practicability of the results. The COSMIN initiative (COnsensus‐based Standards for the selection of health Measurement Instruments) highlights the importance of systematic reviews of psychometric properties of PROMs for the standardization of clinical choice (Mokkink et al. 2016). Measurement properties (Validity, Reliability, and Responsiveness) are objective concepts that allow the evaluation of a subjective attribute, such as PA (Mokkink et al. 2016). Therefore, the objective of this study was to identify, evaluate and synthesize the psychometric properties of PROMs used in the assessment of PA in CKD.
Methods
2
This is a systematic review conducted according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta‐Analyses) (Page et al. 2021) guidelines and registered in PROSPERO (International Prospective Register of Systematic Reviews) under ID CRD42022312143. The PRISMA Checklist is available in Supporting Information S1: Table S1. Two independent reviewers conducted a parallel search and selection of studies in the PubMed, SciELO, Medline, Lilacs, and EMBASE databases.
As this study is a systematic review of measurement properties, the COSMIN initiative was used both to outline the research question and to choose descriptors (Prinsen et al. 2018). There were used four descriptors disease related (Chronic Kidney Disease; Chronic Renal Insufficiency; Chronic Kidney Failure; Dialysis), two construct related (Physical Activity; Physical Activities), and 12 descriptors psychometric related (Psychometrics; Patient Outcome Assessment; Outcome Assessment, Health Care; Reproducibility of Results; Reliability of Result; Validity of Result; Reliability and Validity; Test Retest Reliability; Validity, Face; Finding Reproducibility; Validation studies; Cross Cultural Comparison). The Boolean operators AND and NO were. The search strategy and the adaption for each database are available in Supporting Information S1: Tables S2 and S3, respectively.
The RAYYAN database was used for screening (Ouzzani et al. 2016). An indirect search was then carried out in the bibliographic references of the included studies, following the same screening. In case of disagreement among the evaluators, the resolution was carried out by consensus or by a third independent reviewer.
There were included studies with a sample of individuals aged 18 years or older, with CKD disease at any stage, whose clinical measure was PA and with the evaluation of at least one of the following properties of the instruments: validity, reliability and/or responsiveness. Abstracts of conference proceedings, literature reviews, and studies that did not use PROMs as the outcome variable for the analysis of physical assessment were excluded. No language restrictions were used.
The search was carried out between January and July 2022, with an update in May 2023 with the inclusion of the fourth descriptor disease related and in December 2024 due to evidence of new publications of interest.
Data Extraction
2.1
Data from the included articles were extracted independently by each reviewer. To standardize the process, a data extraction form was created based on PRISMA recommendations (Page et al. 2021) and extraction tables exemplified by COSMIN (Mokkink et al., 2017).
Psychometric Properties
2.2
The present study aimed to evaluate the psychometric variables commonly assessed in PROMs: reliability, validity and responsiveness.
Reliability is the degree or quality of consistency with which the instrument's items measure the construct and allows reproduction and obtaining consistent results when applied on different occasions (Echevarría‐Guanilo et al. 2017). It includes attributes of internal consistency (assessed by Cronbach's Alpha Coefficient—Coefα), measurement error (assessed by Standard Error of Measurement—SEM), test, and inter‐rater reliability (both evaluated by Intraclass Correlation Coefficient—ICC—or by Pearson or Spearman Correlation) (Luján‐Tangarife and Cardona‐Arias 2015).
Validity is the instrument's ability to measure the proposed construct for which it was designed (Echevarría‐Guanilo et al. 2017). It is composed of attributes such as face validity (no statistical evaluation, qualitative assessment), content validity (assessed by exploratory factor analysis—EFA), construct validity (assessed by Content Validity Index—CVI—and Content Validity Ratio—CVR), criterion validity and divergent/convergent validity (both evaluated by Pearson or Spearman Correlation) (Luján‐Tangarife and Cardona‐Arias 2015).
Responsiveness is the ability of the instrument to detect changes over time in the construct being measured, and it's often utilized to evaluate disease progression or treatment response (Echevarría‐Guanilo et al. 2017).
Assessment of the Quality of Studies
2.3
To assess the quality of the studies, we used the COSMIN Risk of Bias (COSMIN RoB) Checklist (Mokkink et al., 2017) and the EMPRO (Evaluating Measures of Patient‐Reported Outcomes) tool (Valderas et al. 2008). The application of the instruments was carried out independently by each reviewer, and cases of disagreement were resolved by means of consensus or evaluation by a third reviewer.
The choice of tools was based on a previous study that verified the performance of different psychometric assessment tools (Lorente et al. 2020): while COSMIN is the most widespread tool to assess measurement properties of PROMs health related; EMPRO tool, on the other hand, is less complex and includes a more thorough assessment of attributes such as feasibility, application burden and interpretability, attributes that are not explored in COSMIN (Lorente al. 2020).
Checklist COSMIN RoB
2.3.1
The COSMIN RoB checklist has 10 sections, with 4–35 items each, which evaluate: Reliability (contains the measurement properties internal consistency, reliability and error measurement), Validity (contains the content validity, criterion validity, and construct validity properties), and Responsiveness (Mokkink et al. 2017). Each measure property aspect evaluates to “+” (sufficient), “−” (insufficient), or “?” (undetermined). Then, the measurement property is classified according to the risk of bias, inconsistency, imprecision, and ambiguity. In the end, the quality of the property is rated as “High”, “Moderate”, “Low” or “Very Low” (Mokkink et al. 2020).
The complete tool and manuals are available for free access and use in the English language (Mokkink et al. 2020).
EMPRO Tool
2.3.2
The EMPRO tool was developed by Valderas et al. (2008) and has 39 items distributed in 8 attributes: Conceptual and measurement model; Reliability; Validity; Responsiveness; Interpretability; Overload; Alternative management models, Linguistic and cross‐cultural adaptations. For each item, there is a 4‐point Likert scale of agreement, where “4” is “Strongly agree” and 1 is “Strongly disagree”. The score can reach from “0” (worst score) to “100” (best score), with values equal or greater than “50” considered acceptable for the attribute (Valderas et al. 2008).
The EMPRO tool requires a license application through the electronic portal https://bibliopro.org and is available in English and Spanish. There are still few published studies using it and none evaluating PROMs of PA.
Results
3
In total, 15,137 articles were found, of which 14 were selected for inclusion in the systematic review. In the secondary search of the bibliographic references of the included articles, there was the inclusion of another 3 articles not found in the initial search, totaling 17 articles selected for evaluation (Figure 1). The year of publication ranged from 2001 to 2024, with a peak of publication in 2019 (3 studies). All studies were published in English. The Table 1 shows the description of the included studies.
PRISMA flow chart.
Eleven different PROMs were found, which only 2 are specific for CKD patients: CKD‐PAQ (Sridharan et al. 2022) and LoPAQ (Sridharan et al. 2022; Huang et al. 2021; Johansen et al. 2015; Kittiskulnam et al. 2019; Tabibi et al. 2024). In non‐specific PROMs, were found evaluation of Duke Activity Status Index (DASI) (Yamabe et al. 2023; Hadjiioannou et al. 2019), Exercise Self‐Efficacy Scale (ESES) (Ravani et al. 2012), Four Week Physical History Questionnaire (FWH) (Hatef et al. 2018), General Practice Physical Activity Questionnaire (GPPAQ) (Robinson‐Cohen et al. 2013), Human Activity Profile (HAP) (Wilkinson et al. 2022; Johansen et al. 2001; Overend et al. 2010; Robinson‐Cohen et al. 2013), International Physical Questionnaire (IPAQ) (Wellard 2003; Lou and He 2019; Robinson‐Cohen et al. 2013), Stanford 7‐Day Physical Activity Recall (PAR) (Rosa et al. 2015; Johansen et al. 2001), Physical Activity Scale for the Elderly (PASE) (Sridharan et al. 2016; Johansen et al. 2001) and Recent Physical Questionnaire (RPAQ) (Robinson‐Cohen et al. 2013).
All stages of CKD were included, but with emphasis on stages 3b–5. ESES and LoPAQ presented the evaluation of the highest number of measurement properties. The most studied psychometric property was criterion validity, in 10 (58%) studies (Sridharan et al. 2016; Huang et al. 2021; Johansen et al. 2001; Kittiskulnam et al. 2019; Lou and He 2019; Ravani et al. 2012; Robinson‐Cohen et al. 2013; Rosa et al. 2015; Sridharan et al. 2016; Wilkinson et al. 2022). The main instrument of comparison was actigraphy, in 3 studies (18%) (Yamabe et al. 2023; Huang et al. 2021; Robinson‐Cohen et al. 2013). Only 1 (6%) study used the DLW method (Rosa et al. 2015).
The evaluation of the methodological quality of the articles by COSMIN RoB and EMPRO is available in Supporting Information S1: Tables S4 and S5. The main reasons that reduced the quality of the studies were small sample size, unsatisfactory numerical values of the results, and incomplete methodological description. In summarization of the level of evidence according to the COSMIN RoB (Table 2), the best performances in validity were obtained in the CKD‐PAQ for construct and convergent validity and in HAP questionnaire for convergent validity. For reliability, DASI and IPAQ presented sufficient results, while ESES presented satisfactory results in the evaluation of internal consistency.
In summarization of the results using the EMPRO tool (Table 3), only CKD‐PAQ and GPPAQ presented satisfactory validity values. DASI, ESES, GPPAQ, IPAQ and HAP questionnaires showed acceptable reliability values. COSMIN RoB and EMPRO tool agreement in the evaluation of CKD‐PAQ, DASI, FWH, PAR, PASE and RPAQ. Among them, only CKD‐PAQ presented satisfactory evidence in the properties evaluated by the two tools (Table 4).
Discussion
4
The present study is the first systematic review of psychometric assessment of PA PROMs in CKD. The literature already has systematic reviews in CKD in the psychometric evaluation of clinical tests of physical function and PROMs for the evaluation of quality of life and symptoms (Sridharan et al. 2016; Aiyegbusi et al. 2017), but not of PA itself. In this review, the evidence found is relatively recent (from 2001 to 2024), with few studies (17 in total) and with great heterogeneity in choice of PROM (11 different PROMs), psychometric assessment and methodological quality.
In CKD, although there is evidence of the importance of maintaining PA, the assessment and determination of clinically important levels still lacks documented clarity (MacRae et al. 2023). Wilkinson et al. (2025) pointed out that the functional assessment of individuals with CKD was not routine, probably due to the lack of guidelines on which tests to use. Thus, this study has clinical relevance for evaluating which PA PROMs are used in CKD and their performance in this population group.
Validity was evaluated in all PROMs found. This finding is expected, considering that validity investigates whether the instrument evaluated has the capacity to measure the intended construct (Bakker et al. 2021). On the other hand, reliability was evaluated in only 6 (54%) PROMs. Reliability investigates the instrument's ability to measure the construct without inaccuracies (Echevarría‐Guanilo et al. 2017). Therefore, the available evidence seeks to support the use of PA PROMs in the context of CKD but requires further research on precision.
The choice to assess the criterion validity is also understandable. This attribute compares the instrument of interest with a widely accepted and used clinical tool (Echevarría‐Guanilo et al. 2017). The comparison of PA PROMs with objectives and wearable dispositives brings clinical relevance to the findings of the studies included in this review. However, these dispositives are reporting low sensitivity in differentiating activity levels in predominantly inactive populations (Echevarría‐Guanilo et al. 2019). This explains the poor performance of most PROMs in assessing validity but creates a debate about which is the best comparison instrument for validating.
Among the non‐CKD‐specific PROMs, HAP had the best performance in the evaluation of validity and reliability. This is probably because the PROM was originally designed for Chronic Obstructive Pulmonary Disease (COPD) (Sylvia et al. 2014), a chronic disease that leads to functional impairment. We hypothesized that HAP had better performance because it was designed for a mostly inactive target population. The Italian Society of Nephrology cites HAP as an option to assess PA in CKD, but highlights that the PROM's length PROM (94 items) may be a limitation for clinical use (Fix and Daughton 1988).
Regarding CKD‐specific PROMs, all indexed literature about them was included in this review. LoPAQ was first described by Bakker et al. (2021) and has 5 published studies (Johansen et al. 2015; Huang et al. 2021; Johansen et al. 2015; Kittiskulnam et al. 2019; Tabibi et al. 2024) and CKD‐PAQ was first published by Yamabe et al. (2023). While CKD‐PAQ proposes to be a PROM more comprehensive and to assess, screen, and prevent the functional decline of milder staging of the disease, LoPAQ, seeks to better distinguish PA levels between dialysis patients, who suffer greater functional impact.
In this review, LoPAQ showed unsatisfactory performance in all properties evaluated. The authors of the studies point out the similarity of these results with other studies that evaluated and validated PROMs against objective dispositives (Sridharan et al. 2022; Huang et al. 2021). Therefore, the insufficient results of LoPAQ found in these studies are not capable of excluding the clinical relevance of this PROM but show the need to validate the LoPAQ with instruments more sensitive to sedentary profile. It is noteworthy that LoPAQ also presented inconsistent results for reliability. Only two studies investigated the measurement property (Kittiskulnam et al. 2019; Huang et al. 2021), with differences in sample number and interval time for test‐retest. Hence, it is necessary that the reliability of LoPAQ be studied carefully.
In summary of the results, only the CKD‐PAQ presented satisfactory results in both COSMIN RoB and EMPRO tools. However, some points need to be taken into consideration. First, there is only one study available and only validity property was evaluated (Tabibi et al. 2024), still requiring reliability assessment. Furthermore, validity was assessed based on the PROM from which the CKD‐PAQ was developed (RPAQ), making it necessary to evaluate the CKD‐PAQ with other comparative instruments.
Most of the studies have focused on severe stages of CKD. Although functional decline is more noticeable in the most severe stages, studies demonstrate an impact on functionality even in the early stages of CKD (Sridharan et al. 2022; Wilkinson et al. 2025), which shows a gap in research in the early stages. Furthermore, it is noteworthy that responsiveness was not evaluated in any study. Responsiveness is a clinically useful property for evaluating the effect of a therapeutic treatment, which was not the objective or context of any of the studies evaluated. This finding demonstrates another gap in research in individuals with CKD, which is the assessment of the level of PA after therapies.
The present study is also distinguished by using two methodological evaluation tools. The COSMIN initiative is widely used, and it is considered the gold standard in studies related to psychometric properties (Clarke et al. 2018). EMPRO, on the other hand, is a new and still less used tool, but with differentials in its evaluation that make it an interesting option (Prinsen et al. 2018). To date, this is the first published study to use the EMPRO tool for PROMs assessing PA.
COSMIN RoB and EMPRO showed disagreements with each other in the psychometric evaluation of 5 PROMs (45.5%). Rosenkoetter and Tate (2017) observed that the COSMIN RoB items mostly evaluate aspects of the methodological design of psychometric properties, while the EMPRO has a mixed composition of items, including attributes of clinical interest such as application burden and feasibility. However, COSMIN RoB is more robust in guiding usage and discriminating the attributes of each measurement property. Thus, the choice of assessment tool should be based on the researcher's objectives and on their expertise with psychometric assessment.
The present study has some limitations. The research did not include grey literature, but a secondary search was carried out in references of all included studies, as suggested by the COSMIN initiative, to minimize the risk of not including studies not found in databases. The evidence found did not allow discrimination of results by stage of the disease, and the performance of PROMs may differ between stages of CKD. It is important to mention that this study focused on adult individuals and the findings cannot extrapolate to pediatric population.
This systematic review found no consensus in the literature for the best PROM for assessing PA in CKD. CKD‐PAQ appears to be a promising tool as it is the only PROM, with a favorable evaluation by both COSMIN RoB and EMPRO. However, this PROM needs more studies evaluating reliability. This review also found a gap in the evaluation and validation of PROMs in mild stages, as well as the assessment of responsiveness. COSMIN RoB and EMPRO tools differed from each other, showing that the choice of the evaluation tool must be guided by the evaluator's expertise and objective.
Implications for Physiotherapy Practice
4.1
Physiotherapists can reduce the impact of functional decline in CKD by assessing and promoting PA in this population. Choosing the best PA assessment tool makes it easier for healthcare professionals to establish functional goals, plan conduct and treatments. PA PROMs are a viable and low‐cost option for clinical use in CKD rehabilitation and monitoring. CKD‐PAQ is a promising tool but lacks reliability and responsiveness studies.
Author Contributions
T.C.S and L.M.M.S conceptualized the study, developed the research protocol, identified articles, extracted data from studies, synthesized, and analyzed the data. G.A.S analyzed the conflicts among the two reviewers. C.A.G., L.M.M.S and T.C.S discussed the results and contributed to the final manuscript. T.C.S wrote the manuscript. L.M.M.S and I.S.S helped on the manuscript final version. All authors reviewed the manuscript and agreed to its published version.
Ethics Statement
The authors have nothing to report.
Consent
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Permission to Reproduce Material From Other Sources
The authors have nothing to report.
Supporting information
Supporting Information S1
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aiyegbusi, O. L. , D. Kyte , P. Cockwell , et al. 2017. “Measurement Properties of Patient‐Reported Outcome Measures (PRO Ms) Used in Adult Patients With Chronic Kidney Disease: A Systematic Review.” P Lo S One 12, no. 6: e 0179733. 10.1371/journal.pone.0179733.28636678 PMC 5479575 · doi ↗ · pubmed ↗
- 2Bakker, E. A. , C. Zoccali , F. W. Dekker , T. M. H. Eijsvogels , and K. J. Jager . 2021. “Assessing Physical Activity and Function in Patients With Chronic Kidney Disease: A Narrative Review.” Clinical Kidney Journal 14, no. 3: 768–779. 10.1093/ckj/sfaa 156.33777360 PMC 7986327 · doi ↗ · pubmed ↗
- 3Battaglia, Y. , F. Baciga , F. Bulighin , et al. 2024. “Physical Activity and Exercise in Chronic Kidney Disease: Consensus Statements From the Physical Exercise Working Group of the Italian Society of Nephrology.” Journal of Nephrology 37, no. 7: 1735–1765. 10.1007/s 40620-024-02049-9.39269600 PMC 11519309 · doi ↗ · pubmed ↗
- 4Bull, F. C. , S. S. Al‐Ansari , S. Biddle , et al. 2020. “World Health Organization 2020 Guidelines on Physical Activity and Sedentary Behaviour.” British Journal of Sports Medicine 54, no. 24: 1451–1462. 10.1136/bjsports-2020-102955.33239350 PMC 7719906 · doi ↗ · pubmed ↗
- 5Clarke, A. L. , F. Zaccardi , D. W. Gould , et al. 2018. “Association of Self‐Reported Physical Function With Survival in Patients With Chronic Kidney Disease.” Clinical Kidney Journal 12, no. 1: 122–128. 10.1093/ckj/sfy 080.30746139 PMC 6366129 · doi ↗ · pubmed ↗
- 6Echevarría‐Guanilo, M. E. , N. Gonçalves , and P. J. Romanoski . 2017. “Psychometric Properties of Measurement Instruments: Conceptual Basis and Evaluation Methods ‐ Part I.” Texto & Contexto ‐ Enfermagem 26, no. 4. 10.1590/0104-07072017001600017. · doi ↗
- 7Echevarría‐Guanilo, M. E. , N. Gonçalves , and P. J. Romanoski . 2019. “Psychometric Properties of Measurement Instruments: Conceptual Basis and Evaluation Methods ‐ Part II.” Texto & Contexto ‐ Enfermagem 28: e 20170311. 10.1590/1980-265X-tce-2017-0311. · doi ↗
- 8Fix, A. J. , and D. M. Daughton . 1988. Human Activity Profile – Professional Manual. Psychological Assessment Resources.
