Reliability of durometry to assess firmness of calcinosis lesions in Juvenile and adult dermatomyositis
Meghan Corrigan Nelson, Lisa G. Rider, Hanna Kim, Scott Gillespie, Vy Do, Julie Fuller, Kelly Rouster-Stevens, Adam Schiffenbauer

TL;DR
This study evaluates the reliability of using a durometer to measure the firmness of calcinosis lesions in dermatomyositis patients.
Contribution
The study introduces durometry as a novel and reliable quantitative tool for assessing calcinosis in dermatomyositis.
Findings
Durometry showed high intra-rater reliability and moderate to good inter-rater reliability in most anatomical regions.
Calcinosis lesions were firmer than control sites, and measurements correlated with physician assessments.
Inter-rater reliability was poor in specific regions like the thigh and anterior calf.
Abstract
Dermatomyositis (DM) and juvenile dermatomyositis (JDM) are inflammatory myopathies affecting multiple organs, including muscle and skin. Calcinosis is a complication of DM/JDM that causes significant morbidity; however, few tools exist to assess calcinosis in DM/JDM patients This study aimed to evaluate the reliability of durometry measurements to assess the firmness of calcinosis lesions in DM and JDM patients. Calcinosis firmness was measured using a handheld digital durometer. Six investigators across 3 institutions examined DM/JDM calcinosis lesions by durometry, as well as control readings in healthy unaffected skin/subcutaneous tissue in similar anatomic areas, recording three readings per site. Intra-rater and inter-rater intraclass correlations were evaluated. We enrolled 57 patients and gathered 709 measurements (443 calcinosis lesions; 266 control lesions) over eleven…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —http://dx.doi.org/10.13039/100000069National Institute of Arthritis and Musculoskeletal and Skin Diseases
- —http://dx.doi.org/10.13039/100000066National Institute of Environmental Health Sciences
- —http://dx.doi.org/10.13039/100004312Eli Lilly and Company
- —Hope Pharmaceuticals
- —Hope Pharmaceuticals
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInflammatory Myopathies and Dermatomyositis · Heterotopic Ossification and Related Conditions · Parkinson's Disease and Spinal Disorders
Introduction
Dermatomyositis (DM) and juvenile dermatomyositis (JDM) are idiopathic inflammatory myopathies characterized by proximal skeletal muscle inflammation and skin manifestations [1]. Calcinosis, a complication of DM/JDM, is associated with significant morbidity and worse functional outcomes [1–5]. Calcinosis can present as superficial lesions, deep nodules, or deposits along myofascial planes, with varying firmness [5].
Durometers are handheld devices used to measure firmness by applying a load on various surfaces [6]. In scleroderma, durometry has been used as a measure of skin fibrosis [6–8]. Durometry presents a viable tool for patient encounters, as measurements can be done rapidly and typically involve no discomfort [6]. In addition, studies have found that skin firmness is relatively constant in healthy control subjects between 15 and 65 years of age, allowing for relatively consistent normal values [8].
While previous studies have evaluated the role of durometry in measuring skin firmness in scleroderma, studies assessing durometry in measuring calcinosis are limited [9,10]. While recent studies have shown changes in durometry readings of calcinosis lesions following treatment of JDM patients, the reliability of these measurements has not been addressed and is an important step in validating the use of durometry [9,10]. Durometry may serve as a useful quantifiable tool for the assessment of calcinosis firmness in patients with DM/JDM, which would allow calcinosis firmness to be evaluated as a potential outcome measure in therapeutic studies of calcinosis. Full validation of durometry as an outcome assessment tool requires demonstration of reliability and sensitivity by multiple investigators. The goal of this study was to determine the reliability of repeated calcinosis durometer measurements in DM/JDM patients, both within providers (intra-rater) and between providers (inter-rater), as well as to further characterize calcinosis lesions within our patient cohort.
Materials and methods
Subjects
Patients with DM/JDM meeting probable or definite Bohan and Peter classification criteria [11] and probable or definite ACR/EULAR classification criteria [12] with calcinosis by imaging and/or physician examination were enrolled across three centers. Based on preliminary data, a minimum of 50 patients was determined to be sufficient sample size to capture clinical differences between patients. Sites of enrollment included the National Institutes of Health (NIH, Bethesda, MD), Children’s Healthcare of Atlanta (CHOA, Atlanta, GA) and Texas Scottish Rite Hospital for Children (TSRHC, Dallas, TX) from 01/12/2015–01/08/2021. Data used in this study was accessed for primary research analyses 01/04/2022 through approximately 09/25/2023. Demographic data, current medications, and assessments of disease activity and damage were obtained at enrollment. Serum from 45 patients was tested for 5 myositis specific autoantibodies (MSAs) and 3 myositis associated autoantibodies (MAAs) by standard immunoprecipitation [13] and immunoprecipitation-immunoblotting methods [14]. Ethnicity was self-reported by questionnaire.
Ethics
Patients were enrolled in a myositis natural history study (protocol 94E0165, NCT00017914) approved by the NIH institutional review board; each enrolling site also obtained local IRB approval with written consent (CHOA,TSRHC) and conducted according to the principles expressed in the Declaration of Helsinki. All patients and parents signed informed consent. The data were fully anonymized before access by the research team.
Durometer calcinosis hardness scores
Durometer measurements were made using a handheld digital durometer (Rex Gauge 1600 Type DD-4–00 Durometer, Buffalo Grove, IL) (S1 Fig), which uses a continuous scale measured in standardized durometer units (DU). Standard techniques were developed and durometry was performed similarly across all three centers. Three consecutive durometer measurements were taken at each calcinosis lesion and anatomically corresponding control sites. Control regions were obtained on non-affected mirrored contralateral sides, where applicable. Measurements were made with the skin in a horizontal plane. Measurements were obtained by placing the durometer laterally and rolling it onto the lesion, in order to be measured consistent with prior durometry studies [8]. The measurements were recorded in a standardized form along with their anatomic location, which was based on 60 anatomic location codes. Location codes were to be reviewed for lesions per area and location similarity and combined into larger areas before analysis (S1 Form). Notably no patients reported pain that limited or restricted the examination. Only gravitational force was applied during measurements, with no additional pressure, thereby eliminating variability due to examiner-applied force.
Intra-rater reliability was assessed by having a single designated (primary) rater, a rheumatology-trained physician, perform durometer measurements on each calcinosis lesion at a single study visit, with repeat measurements performed either at the time of initial examination or within 24 hours of the initial assessment. Dr. Rider evaluated patients < 18 years at the NIH Clinical Center, while Dr. Schiffenbauer assessed those ≥18, unless a preexisting investigator-patient relationship designated another primary rater. Primary raters at CHOA and TSRHC were identified as the patients’ treating rheumatologist. Inter-rater reliability was assessed only at the NIH. Inter-rater reliability was assessed by comparing measurements from a primary rater to measurements from a secondary rater(s). Lesions were demarcated with a skin marking pen at the targeted sites. Raters performed their assessments blinded to and within 5 days of the other raters’ assessments. Intra- and inter-rater reliability were calculated when five or more assessments were available per anatomic location. When available, calcinosis lesions were qualitatively characterized as hard, wooden, fluctuant, or liquified (S1 Form).
Statistical analyses
All analyses were performed in SAS v.9.4 (SAS Institute, Cary NC) and CRAN R v.4.2 (R Core Team 2022), and statistical significance was set at the 0.05 threshold. Demographic and clinical features for DM/JDM patients were summarized using medians [interquartile ranges, IQR] or means ± standard deviations (SD) (range) for continuous variables and frequencies (percentages, %) for categorical variables.
Primary outcomes (lesion measurements) were treated as continuous variables and analyzed using ICCs and general linear regression methods with maximum likelihood estimation. The primary exposure, calcinosis versus control lesion site (categorical, 2-level), was examined, with anatomic location (categorical, 11-level) considered as an effect modifier. Therefore, analyses were stratified by anatomic location, and comparisons between calcinosis and control lesion sites were made when applicable. Missing data were assumed to be missing at random (MAR). No imputation was performed; instead, maximum likelihood estimation was used in all regression models to prevent listwise deletion of records with missing data. The ICC methods applied in the analysis to accommodate missing data are detailed below.
Intra- and inter-rater reliability were calculated using intraclass correlations (ICCs). Using guidance from Koo & Li (2016), intra-rater ICCs were derived using single-rater, two-way mixed effects models assessing for absolute agreement, and treating raters as fixed effects and the measurements as random effects [15]. For inter-rater ICCs, absolute agreement, two-way random effects models were utilized, treating both raters and measurements as random effects. Fixed effects were specified for raters in the intra-rater ICC models, as it is not reasonable to generalize one rater’s scores against themself to a population of raters. Conversely, random effects were specified for raters in the inter-rater ICC models, as the goal was to generalize results to other raters in a real-world setting. All ICCs were accompanied by 95% Wald confidence intervals (CIs). Repeated measurements at the same location for the same rater were averaged for inter-rater analysis. Inter-rater calculations were then performed, comparing averaged durometry assessments from the primary rater to the averaged durometry assessments from the secondary rater. In some situations, measurements from up to 2 secondary raters were available for a single primary rater measurement. These data imbalances were circumvented by use of the irrNA package (v.0.2.2) in CRAN R, which calculates ICCs based on continuously scaled data and derives reliability estimates in the presence of imbalanced data without imputation or listwise deletion. Inter-rater ICC values were also run using initial observations, instead of averages, for sensitivity analysis. Given the novel use of durometry in DM/JDM without prior published data regarding its use in this patient population, intra-rater and inter-rater ICCs were interpreted based on prior published generalized cut-offs: excellent (greater than 0.9), good (0.75 to 0.9), moderate (0.5 to 0.75), and poor (less than 0.5) [16]. Finally, a general linear mixed model with raters treated as fixed effects was used to determine if paired calcinosis and control measurements significantly differed by location. Results are reported as least-squares means (LS-means) and LS-mean differences with 95% CIs, p-values, and ranges of the data. Our methods were reviewed by a second independent statistical team who provided feedback on our analyses.
Results
Fifty-seven patients were enrolled in the study, 30 (55%) had JDM, and 27 (45%) had DM. There was a predominance of Caucasian (58%) and female (63%) individuals with a median age of 18 years (inter-quartile range 14–30 years) (Table 1). A total of 45 patients were assessed at the NIH, 8 patients were assessed at CHOA, and 4 patients were assessed at the TSRHC. There were 3 raters (NIH), 1 rater (CHOA), and 1 rater (TSRHC) available for assessments, respectively. Overall, there were 129 patient-rater interactions, and 11 unique anatomic location codes, for a total of 398 independent lesions with 709 durometry assessments of both calcinosis and control lesions. There were 33 patients with 176 durometry measurements who had multiple assessors that could be used for inter-rater reliability analysis; of those, 112 (64%) had 3 assessors and 64 (36%) had 2 assessors.
Table 1: Demographic and Clinical Features of 57 Adult and Juvenile Dermatomyositis Patients in the Study.
The median Physician Global Activity was 3.3 ± 2.5 and the median Childhood Myositis Assessment Score was 42.7 ± 9.9 (Table 1). Patients also had damage, indicated by the average Myositis Damage Index (MDI) severity of 13.8 ± 9.9 (Table 1). Approximately 67% of patients had a chronic continuous disease course, and only 3% had a monocyclic course (Table 1).
A total of 709 durometric measurements were obtained, consisting of 443 calcinosis and 266 control lesion measurements, and included 244 paired calcinosis and control assessments from 60 specific locations. The range of measurements for calcinosis lesions was 2.2 to 87.1 durometry units (DU) and the range for control sites was 1.6 to 73 DU. Anatomic sites that were similar in location and with similar underlying bony architecture were combined, such that the 60 anatomic location codes were condensed to 11 anatomic sites. The 11 anatomic sites included upper neck/clavicle, back/torso, upper arms bilaterally, forearms bilaterally, elbows, hands/wrists, buttocks, thigh, anterior calf, posterior calf, and feet (S1 Table). All patients were able to complete the examination, and no injuries were reported.
Overall, our study indicated good to excellent intra-rater reliability in durometry measurements (Table 2). For calcinosis sites, intra-rater ICCs ranged from 0.75–0.93, with 4 anatomic sites demonstrating excellent ICCs (thigh, elbows, hands/wrists, and foot) and 7 sites demonstrating good ICCs (back/torso, posterior calf, buttocks, anterior calf, upper arms, upper neck/clavicle, and forearms).
Table 2: Intra-rater and inter-rater for calcinosis and control lesions for DM/JDM patients by anatomic location.
Since intra-rater results were strong, repeated measurements at the same location from the same rater were averaged prior to inter-rater analysis. Inter-rater reliability could not be calculated for four calcinosis locations and six control locations due to too few observations (< 5 measurements). In the subset of patients with inter-rater data available, inter-rater reliability ranged from good to poor, depending on anatomic location. Inter-rater ICCs spanned from 0.30 to 0.85 (Table 2). For calcinosis lesions, the upper arms, elbows, and posterior calf had good inter-rater ICCs. For control lesions, forearms had a good inter-rater ICC. The sensitivity analysis performed for inter-rater ICC values using the first durometry observation rather than averaged durometry values yielded overall similar results, except the anterior calf where the ICC difference was 0.21 (S3 Table).
Significant differences in durometry measurements were seen between calcinosis and control sites. Paired analyses of measurements for calcinosis versus control sites were significantly different, with greater LS-mean measurements for calcinosis sites relative to control sites at each anatomic location (all p < 0.01, Table 3). Notably, the locations with the highest LS-mean difference between calcinosis and control sites were forearms (LS-mean difference: 31.6), upper arms (28.1), and foot (24.4). Elbows (10), posterior calf (15.8), and upper neck/clavicle (16.9) had the smallest LS-mean difference between calcinosis and control sites (Table 3).
Table 3: Paired mixed model least-squares mean (LS-means) differences in durometry measurements between calcinosis and control lesions, by anatomic location.
For primary rater’s qualitative descriptors the most frequently reported characteristic of calcinosis lesions was ‘hard’ followed by ‘wooden’ and the least frequently reported was ‘fluctuant/liquified’(S2 Table). Overall, ‘fluctuant/liquified’ calcinosis lesions were associated with lower durometry measurements [mean 32.1 ± 13.3 DU], compared to ‘hard’ calcinosis lesions which were associated with the highest durometry measurements [mean 42.7 ± 16.7 DU] (Table 4). Intra-rater ICCs were generally good to excellent for all hard, wooden, fluctuant/liquified, and combined lesions (S2 Table).
Table 4: Characteristics of calcinosis lesions and corresponding durometry measurements in patients with DM and JDM at initial visit1.
Discussion
This study demonstrates the reliability and content validity of durometry in evaluating calcinosis, and helps to quantifiably characterize calcinosis in DM/JDM patients. Our study shows viability of durometry as a measure of calcinosis given our high intra-rater reliability and predominantly moderate to good inter-rater reliability, although there was noted poor inter-rater reliability noted at the thigh and anterior calf. Additionally, this study confirms durometry’s content validity with greater firmness of lesions among pooled calcinosis versus control sites across anatomic regions. There is also evidence for the construct validity of durometry, as qualitative descriptions of durometry correlate with noted variations observed in quantitative durometry measurements (Table 4, S2 Table). Moreover, there is evidence for face validity, given the direct measurement of calcinosis by this tool in DM (Table 2), as well as its historical use in scleroderma as an effective tool in measuring skin firmness [5–8].
Overall, our results suggest reliable measures of durometry in DM/JDM patients, and therefore durometry could be a potentially useful tool in clinical practice and future therapeutic trials for calcinosis. Our inter-rater agreement showed a majority of assessments indicating moderate to good reliability with variation between different areas. Both calcinosis and control inter-rater reliability in the thigh were decreased. This suggests that there may be an inherent heterogeneity of durometry readings in this anatomic location. Sensitivity analysis also revealed decreased inter-rater ICC for the anterior calf. This heterogeneity by anatomic site highlights the importance of picking an appropriate site for evaluation. Of note, the elbows offered excellent intra-rater reliability, good inter-rater reliability, and are a relatively common site of calcinosis in DM/JDM patients.
Limitations of this study include small sample sizes for some anatomic regions, limiting statistical comparisons, and that inter-rater comparisons were only performed at one study site. While our study utilized three academic centers across the United States, future studies should increase the number of participants and centers. This study examined inter-rater reliability, capturing variability between users, which was moderate to good in most anatomic locations. Given that providers had limited experience with durometers prior to this study, additional training may further improve inter-rater agreement. Overall, the differences in anatomic location were subtle and should be substantiated with further studies. Caution should be used when selecting potential sites for measurement of calcinosis durometry readings due to aforementioned variability in readings. Another limitation is absence of data from serial evaluation and limited lesion size data. Due to the limited physical area detected by durometry at a given time, it would not be practical to detect calcinosis in a large anatomic area; however, we were able to successfully capture characteristics of calcinosis due to firmness in a more targeted area, where it helped determine calcinosis versus control status. Furthermore, durometry has not been evaluated for predictive validity. Additionally, there is a need to correlate durometry measurements with clinical outcome measures and disease progression.
Overall, our study identifies durometry as a reliable, quantitative tool in assessing calcinosis in DM/JDM patients, of particular importance given the current lack of quantifiable measures to assess calcinosis firmness. Broad implementation of durometry across institutions could enable more robust comparisons among clinical trial cohorts and future observational studies. Durometry offers a tool to objectively assess calcinosis during patient encounters, with potential utility in providing an objective and reliable measurement of calcinosis for clinical trials. We demonstrated content validity in DM/JDM patients given greater firmness of lesions among calcinosis versus control sites by anatomic regions, as well as acceptable reliability. While durometry is a promising method for quantitative assessment of cutaneous calcinosis in DM/JDM, additional studies with measurement over time and relationship to clinical outcomes are needed to further validate durometry and substantiate its role in the assessment of calcinosis in DM/JDM patients.
Supporting information
S1 FigImage of Handheld Digital Durometer.Similar make and model of the handheld digital durometer (Rex Gauge 1600 Type DD-4-00) utilized for quantitative assessment of calcinosis firmness.(DOCX)
S1 FormCalcinosis Type – Sentinel Lesion Form.Categorization of calcinosis lesions by type and anatomic location, were recorded in this Sentinel Lesion Form.(DOCX)
S1 TableCollapsed anatomic sites derived from Sentinel Lesion Form.Detailed anatomic site responses from the Sentinel Lesion Form were reviewed and combined into harmonized categories to facilitate downstream analyses. This table summarizes each collapsed site grouping and its component locations.(DOCX)
S2 TableIntra-rater ICCs and 95% CI for calcinosis assessments by density characteristics for DM/JDM patients.This table presents intra-rater intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals for calcinosis assessments stratified by density characteristics in patients with dermatomyositis and juvenile dermatomyositis (DM/JDM).(DOCX)
S3 TableSensitivity analysis for inter-rater reliability of calcinosis and control assessments by Anatomic Location for DM/JDM patients.This table summarizes sensitivity analyses of inter-rater agreement for calcinosis and control assessments across specific anatomic locations in DM/JDM participants, providing estimates of reliability under alternative analytic conditions.(DOCX)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chung MP, Richardson C, Kirakossian D, Orandi AB, Saketkoo LA, Rider LG, et al. Calcinosis biomarkers in adult and juvenile dermatomyositis. Autoimmun Rev. 2020;19:102533–42.32234404 10.1016/j.autrev.2020.102533 PMC 7225028 · doi ↗ · pubmed ↗
- 2Wienke J, Deakin CT, Wedderburn LR, van Wijk F, van Royen-Kerkhof A. Systemic and Tissue Inflammation in Juvenile Dermatomyositis: From Pathogenesis to the Quest for Monitoring Tools. Front Immunol. 2018;9:2951. doi: 10.3389/fimmu.2018.02951 30619311 PMC 6305419 · doi ↗ · pubmed ↗
- 3Tansley SL, Betteridge ZE, Shaddick G, Gunawardena H, Arnold K, Wedderburn LR, et al. Calcinosis in juvenile dermatomyositis is influenced by both anti-NXP 2 autoantibody status and age at disease onset. Rheumatology (Oxford). 2014;53(12):2204–8. doi: 10.1093/rheumatology/keu 259 24987158 PMC 4241891 · doi ↗ · pubmed ↗
- 4Gutierrez A Jr, Wetter DA. Calcinosis cutis in autoimmune connective tissue diseases. Dermatol Ther. 2012;25(2):195–206. doi: 10.1111/j.1529-8019.2012.01492.x 22741938 · doi ↗ · pubmed ↗
- 5Huber AM, Lang B, Le Blanc CM, Birdi N, Bolaria RK, Malleson P, et al. Medium- and long-term functional outcomes in a multicenter cohort of children with juvenile dermatomyositis. Arthritis Rheum. 2000;43(3):541–9. doi: 10.1002/1529-0131(200003)43:3<541::AID-ANR 9>3.0.CO;2-T 10728746 · doi ↗ · pubmed ↗
- 6Merkel PA, Silliman NP, Denton CP, Furst DE, Khanna D, Emery P, et al. Validity, reliability, and feasibility of durometer measurements of scleroderma skin disease in a multicenter treatment trial. Arthritis Rheum. 2008;59(5):699–705. doi: 10.1002/art.23564 18438905 PMC 3887555 · doi ↗ · pubmed ↗
- 7Kissin EY, Schiller AM, Gelbard RB, Anderson JJ, Falanga V, Simms RW, et al. Durometry for the assessment of skin disease in systemic sclerosis. Arthritis Rheum. 2006;55(4):603–9. doi: 10.1002/art.22093 16874783 · doi ↗ · pubmed ↗
- 8Aghassi D, Monoson T, Braverman I. Reproducible measurements to quantify cutaneous involvement in scleroderma. Arch Dermatol. 1995;131(10):1160–6. doi: 10.1001/archderm.1995.01690220066013 7574833 · doi ↗ · pubmed ↗
