Scan/rescan reliability of magnetic resonance imaging (MRI)
Menekse Salar Barim, M. Fehmi Capanoglu, Richard F. Sesek, Sean Gallagher, Mark C. Schall, Ronald J. Beyers, Gerard A. Davis

TL;DR
This study evaluates how reliable MRI measurements of lower lumbar vertebrae are when different operators and analysts are involved.
Contribution
The study introduces a comprehensive evaluation of MRI reliability that includes imaging, image selection, and analysis.
Findings
Mean absolute differences in measurements were 4.05% in worst-case scenarios involving different MRI operators and analysts.
Variability in measurements was minimal and mainly due to breathing artifacts.
MRI demonstrates robust reliability for biomechanical assessments of bony structures.
Abstract
Magnetic resonance imaging (MRI) is increasingly used to estimate the geometric dimensions of lower lumbar vertebrae. While MRI-based measurements have demonstrated good reliability with interclass correlation coefficients (ICCs) of 0.80 or higher, many evaluations focus solely on the comparison of identical MRI images. This approach primarily reflects analyst dexterity and does not assess the reliability of the entire process, including imaging and image selection. To evaluate the inter- and intra-rater reliability of the entire process of using MRI to measure biomechanically relevant lumbar spinal characteristics, incorporating imaging, image selection, and analysis. Methods: A dataset of 144 low-back MRI scans was analyzed. Reliability assessments were performed under different conditions: (1) identical scans rated by the same analyst at different times (intra-rater reliability) and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —http://dx.doi.org/10.13039/100000125National Institute for Occupational Safety and Health
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpine and Intervertebral Disc Pathology · Medical Imaging and Analysis · Musculoskeletal pain and rehabilitation
Introduction
Since the 1980’s, magnetic resonance imaging (MRI) has gained popularity as a diagnostic tool for musculoskeletal disorders [1], particularly to assess patient lumbar spinal health. MRI has benefits for imaging the musculoskeletal system [2–4], facilitating better visualization of anatomic and potentially pathologic structures, including cartilage, bones, and ligaments [2, 5–7]. Improved imaging methods have provided better means to measure low back structures’ size and relative position. Many surgeons rely on MRI as an accurate, noninvasive diagnostic method and rationale for medical decisions, including lumbar disc implants.
However, MRI remains relatively expensive. Considering the role that economics plays in patient management, questions arise regarding when and how often an MRI should be administered and the repeatability and accuracy of the images themselves. Bennet and Miller succinctly describe the importance of this endeavor: “Reliability is the cornerstone of any scientific enterprise. Issues of research validity and significance are relatively meaningless if the results of our experiments are not trustworthy. It is the case that reliability can vary greatly depending on the tools being used and what is being measured. Therefore, it is imperative that any scientific endeavor be aware of the reliability of its measurements.” [8].
Sowell et al. [9] asserted that performing repeated scans on relatively few subjects acquired within the same scan session (i.e., the subject remains in position in the scanner) or within concise scan intervals (e.g., subjects removed from the scanner and then scanned again minutes later) may greatly underestimate the sources of variability within and between studies on MRI-derived measurements. However, such a study, while proposed by Sowel et al., was not conducted. To investigate the reliability and accuracy of MRI and the robustness of the process itself, a comprehensive scan-rescan study was conducted.
Rovaris et al. [10] suggested that scan-rescan variability should be compared with the intra-rater variability with three repeated volume measurements of the same scan. However, evidence of such studies is limited. Existing databases of vertebral and intervertebral dimensions tend to be limited with respect to measures of reliability/repeatability with relatively narrow study populations and/or parameters recorded [11]. In addition, most datasets on comprehensive accurate lumbar vertebrae and muscle geometry have not included sex differences. Both men and women of varying ages were included in several studies [12–14]. The results varied widely among studies, possibly due to the different sample sizes and age groups.
The objective of this study was to assess the inter- and intra-rater reliability of the MRI process itself. Few studies have addressed the overall reliability of MRI measures using distinct scans to measure the same parameters. Our hypothesis was to explore the “worst-case” scenario for biological parameter estimation reliability by comparing separate scans rather than re-examining identical scans a second time.
Materials and methods
Subjects
MRI scans of the lumbar intervertebral segments (L2-S1) and trunk/core muscles of thirty-six (36) subjects (20 males and 16 females) were included in this study. All subjects were asymptomatic and older than 19 years of age. All subjects were scanned on a 3 T scanner using a standardized T2 weighted protocol.
None of the subjects reported a history of activity-limiting chronic back or leg injuries, nor experienced any low back pain at the time of the MRI scan. In accordance with Lee et al.’s (1988) exclusion criteria,”Potential participants who had (1) Degenerative changes in the lumbar spine (e.g., crushed vertebral body, trauma, etc.) and/or Erector Spinal muscles (ESMs) (e.g., atrophy); (2) Obvious spinal deformities; or (3) Any known pathology relevant to and likely to alter low back geometry (e.g., scoliosis and tumor) were not included in this study. Subjects were consented in accordance with the Institutional Review Board (IRB) at Auburn University.
Magnetic resonance imaging
Lumbar MRI scans (L2-S1) of the subjects were performed using a 70 cm Open Bore 3 T scanner (MAGNETOM Verio, Siemens AG, Erlangen, Germany) at the Auburn University MRI Research Center. All subjects were examined in a head-first-supine position (head towards the magnet with hands lying across their abdomen) with arm supports (available upon request) and leg support (mandatory). MRI coil selection included the in-table spine coils and a flexible body coil placed in between the pubic bones at the middle-lower anterior region of the abdomen. The flexible body coil was secured with straps to maximize signal consistency. The imaging protocols included a standard morphological T2-weighted turbo-spin-echo (TSE) sequence in the sagittal plane, Axial Continuous T2-weighted TSE, and Axial Multi-group T2-weighted images with the following parameters:
- T2-weighted spin-echo with a repetition time (TR) of 3440 ms and an echo time (TE) of 41 ms.
- The section thickness was 3 mm with 385 FoV read and 100% FoV phase.
All subject survey and MRI data were anonymized and linked using a unique subject ID unrelated to their personal information. Data were archived and stored on a secure server.
MRI reading procedure
Two researchers with experience collecting low back MRI images collected and measured all of the MRI data. One of the operators performed a localizer scan to verify subject placement (aligned straight on the scanner). A localizer scan was obtained to assist in subject placement and allow the analyst to focus on relevant regions of interest (see Fig. 1).Fig. 1. Localizer MRI Scan
The two MRI analysts reviewed the scans separately. Data were collected from two studies [15, 16] (n = 26 and n = 10, respectively). Images were analyzed a second time after one month. The order of presentation of images was randomized for each analysis. Both analysts were blinded to subject identity. During the MRI interpretation, which considered all MRI sequences, the operators could freely adjust image brightness, contrast, and zoom, to select the slices that they felt were best for measuring the parameters of interest (e.g., disc size, muscle lever arms, etc.).
In order to evaluate MRI reliability, repeated scans with short inter-scan time intervals were performed. Reliability for the entire process was evaluated using a worst-case scenario that compared two distinct scans of the same subject with different analysts positioning subjects and performing each MRI scan (imaging protocol is shown in Fig. 2). In addition, the analysts measured each scan using Osirix^©^ software; their own scans, and the scans performed by their colleague.Fig. 2MRI Scan/Rescan Procedure
Measurement of lumbar regions (L2-S1)
Axial and Sagittal MRI scans were analyzed using an open-source digital imaging and communications in medicine (DICOM) software, OsiriX, 8.0.1, 32bit, (Bernex, Switzerland). At each spinal level from L2 to S1, 14 measurements were performed and shown in Figs. 3, 4, and 5. Abbreviations are used for these 14 measurements and are shown in Table 1.Fig. 3. Sagittal MRI scan with measurements of A-FFig. 4Sagittal MRI scan with measurements of H and IFig. 5Axial MRI scan measurements of J–NTable 1Abbreviations for measurementsDescription (Fig. representation)Abbreviation(A) Sagittal Vertebrae Body WidthSVBW(B) Concavity HeightCH(C) Anterior Vertebrae HeightAVH(D) Posterior Vertebrae HeightPVH(E) Superior Vertebrae Body LengthSVBL(F) Inferior Vertebrae Body LengthIVBL(G) Sagittal Vertebrae Body HeightSVBH(H) Anterior IVD HeightAIVDH(I) Posterior IVD HeightPIVDH(J) Cross Sectional areas of Psoas RightPR(K) Cross Sectional areas of Psoas LeftPL(L) Cross Sectional areas of Erector Spinae RightESR(M) Cross Sectional areas of Erector Spinae LeftESL(N) Disc SizeDAL
Statistical analysis
The agreement between the examiners (inter-rater) and within each examiner (intra-rater) was analyzed using intra-class correlation coefficients (two-way mixed models: ICC_3,1_) where analysts were fixed variables and subjects were random variables. Muscle Cross-sectional Area (CSA) and Intervertebral Disc (IVD) measurements were stacked together (from L2 to S1 level) to estimate an overall ICC value rather than estimating an individual score for each IVD level. Combining these measurements from multiple levels of the spine (L2 to S1), we obtain a more comprehensive assessment of the overall relationship between muscle CSA and IVD characteristics. In addition to this, aggregating data across multiple levels increases the sample size and statistical power of the analysis. These analyses were performed in SPSS (version 19). Bland–Altman plots were drawn to better visualize the data.
Mean Absolute Percent Differences (MAPDs) were calculated as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MAPD= \frac{\sum_{i=1}^{n}\left|\frac{\left({X}_{i}- {Y}_{i}\right)}{\left({X}_{i}+ {Y}_{i}\right)/2}\right|}{n} \times 100$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${X}_{i}$$\end{document} is the first measurement, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${Y}_{i}$$\end{document} is the second measurement, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n$$\end{document} is the total number of observations.
Since \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${X}_{i}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${Y}_{i}$$\end{document} are two distinct measurements and there is no ground truth, the scores were averaged when estimating error to minimize bias.
Results
Descriptive statistics
In total, 144 MRI scans were obtained from 36 subjects. Table 2 presents the demographic data (age, sex, height, and weight). The average age was 23.7 years for males (SD 3.1) and 25.4 years for females (SD 4.8). At each lumbar level (L2-S1), measurements were taken for each IVD and Vertebral body by manually identifying and tracing the actual shape of the structures.Table 2. Mean (SD) anthropometric and demographic data for the male and female subjectsSexAge (yr)Height (cm)Weight (kg)BMI (kg/m^2^)Female (n = 16)25.4 (4.8)164.4 (6.6)64.6 (9.4)24 (3.9)Male (n = 20)23.7 (3.1)178.2 (8.9)75.5 (12.2)23.8 (3.6)
Scan agreement
Two analysts measured all parameters three times with at least one month between repeated measurements of the same scan to assess the reliability and the repeatability of measurements. Data from six sets of measurements were compared. For each lumbar region analyzed (e.g., L4-L5), there were 50 distinct slices from which researchers could choose to perform their measurements. In order to test the reliability of the overall process, specific image slices were not pre-selected for the analysts prior to measurements. Each observer chose the slice they thought most appropriate for measuring the parameters. A comparison of different scans by operators and reviewed/measured by different analysts had not been previously conducted using a substantive sample size. The agreement of analysts in choosing the same slice or within one slice is shown in Table 3. The results show that the same slice was selected 61% of the time, and selections were within one slice over 90% of the time. The agreement was highest when researchers reanalyzed their own scans and was lowest but still excellent when they measured each other’s scans. Analysts waited at least one month between analyses of the same scans, with scans presented in random order each time.Table 3. Probability of selecting identical slice for analysisAnalysisAbsolute agreement ± 1 sliceWithin analyst0.790.98Between analysts0.500.85Overall0.610.90
Table 3 shows that when analysts repeated an analysis of a particular scan, they selected the same slice 79% of the time. Different analysts reviewing the same scan selected identical slices 50% of the time. Overall, it should be noted that analysts were within one slice of each other’s scans 90% of the time and did not differ by more than two slices (6 mm).
The mean absolute errors for each IVD level were reported as percentages in Tables 4 and 5. The results show that bony structures have lower error percentages except for AIVDH and PIVDH. In general, the smaller structures had bigger percentage errors. Moreover, when the analysts evaluated their own scans, error percentages decreased. Table 4. Summary of Bland–Altman analysesVariablesCorrelation between analystsMean error (cm)Standard Deviation (cm)% differenceLimits of agreement = Upper LOA—Lower LOAAVH0.71− 0.030.2020.07PVH0.73− 0.090.1970.18SVBW0.79− 0.100.2870.20CH0.480.030.6230.06SVBL0.64− 0.110.2470.22IVBL0.87− 0.070.3150.13SVBH0.67− 0.030.2030.06AIVDH0.60− 0.361.67100.72PIVDH0.57− 0.250.90110.50PR0.69− 0.301.7550.60PL0.69− 0.431.7570.87ESR0.761.992.63223.99ESL0.771.592.51183.19DAL0.62− 0.330.9550.66*Anterior Vertebrae Height (AVH); Posterior Vertabrae Height (PVH); Sagittal Vertebrae Body Width (SVBW); Concavity Height (CH); Superior Vertebrae Body Length (SBVL); Inferior Vertebrae Body Length (IVBL); Sagittal Vertebrae Body Height (SVBH); Anterior IVD Height (AIVDH); Posterior IVD Height (PIVDH); Cross sectional areas of Psoas Right (PR); Cross sectional areas of Psoas Left (PL); Cross sectional areas of Erector Spinae Right (ESR); Cross sectional areas of Erector Spinae Left (ESL); Disc Size (DAL)Table 5. Mean Absolute Percentage Difference (MAPD) for Measurements of Intervertebral Discs and MusclesDimensionSet of scan slices/AnalystL2/L3L3/L4L4/L5L5/S1OverallAIVDHSame/Same^1^5.665.404.003.924.74Different/Same7.005.955.875.696.13Same/Different13.8111.4212.1113.0012.58Different/Different^2^13.6211.3011.8713.4212.55PIVDHSame/Same^1^7.426.525.615.876.35Different/Same9.726.457.146.627.48Same/Different15.9714.1012.3914.2414.18Different/Different^2^15.9513.6812.3614.5814.14CHSame/Same^1^8.127.336.827.097.34Different/Same8.618.697.7110.318.83Same/Different17.0617.2616.1325.9919.11Different/Different^2^16.0616.9915.9624.9918.50Psoas RSame/Same^1^7.286.765.706.776.63Different/Same12.257.966.276.508.24Same/Different17.2214.089.898.9612.54Different/Different^2^17.0714.719.948.7112.61Psoas LSame/Same^1^8.346.715.866.946.96Different/Same12.578.695.967.838.76Same/Different20.7414.559.278.8713.36Different/Different^2^20.6114.498.828.4113.08Erector Spinae RSame/Same^1^5.004.456.466.975.72Different/Same5.875.298.1314.748.51Same/Different5.566.0913.3326.9812.99Different/Different^2^5.926.6413.4924.5112.64Erector Spinae LSame/Same^1^4.775.395.078.515.94Different/Same6.655.976.9112.648.04Same/Different6.675.2612.9821.0611.49Different/Different^2^6.745.7011.9918.6310.76Disc sizeSame/Same^1^2.413.322.603.683.00Different/Same3.733.423.314.313.69Same/Different4.223.844.515.584.54Different/Different^2^4.984.234.595.364.79^1^Expected “best case” (same set of scan slices analyzed by the same analyst)^2^Expected “worst case” (different set of scan slices analyzed by different analyst)
The ICCs for intra-rater reliability show high to excellent (0.806–0.989) agreement for both analysts and are reported in Table 6. The first sets of measurements for both analysts were compared to evaluate inter-rater agreement. The ICC results indicated high to excellent agreement for most of the measurements except PIVHD, SVBL, Disc Size, and CH. It should be noted that CH is a two-step measurement requiring analyst judgment [16], therefore can be more subjective then the other part measurements.Table 6. Intra-class Correlation Coefficients Table for Inter- and Intra-rater ReliablityDimensionIntra-rater reliability for analyst 1 and analyst 2Inter-rater reliabilityAnalyst 1Analyst 2Analyst 1 and 2ICC3,195% CIICC3,195% CIICC3,195% CISVBW0.965(0.953–0.974)0.964(0.952–0.973)0.856(0.741–0.912)CH0.939(0.916–0.955)0.849(0.793–0.890)0.608(0.494–0.702)AVH0.918(0.892–0.938)0.939(0.919–0.954)0.845(0.657–0.917)PVH0.946(0.928–0.959)0.924(0.890–0.946)0.806(0.396–0.913)SVBL0.896(0.863–0.922)0.948(0.931–0.961)0.732(0.474–0.847)IVBL0.968(0.957–0.976)0.984(0.979–0.988)0.908(0.866–0.936)SVBH0.958(0.943–0.968)0.950(0.934–0.963)0.835(0.768–0.882)AIVDH0.964(0.950–0.974)0.959(0.943–0.970)0.832(0.686–0.901)PIVDH0.821(0.759–0.868)0.897(0.860–0.925)0.710(0.594–0.793)Psoas R0.944(0.924–0.960)0.975(0.966–0.982)0.888(0.809–0.930)Psoas L0.963(0.949–0.974)0.979(0.971–0.985)0.908(0.846–0.942)Erector S. R0.968(0.955–0.977)0.989(0.985–0.992)0.911(0.835–0.947)Erector S. L0.967(0.954–0.976)0.988(0.983–0.991)0.925(0.891–0.948)Disc Size0.806(0.739–0.857)0.854(0.802–0.893)0.733(0.580–0.824)
Bland–Altman analysis
The Bland–Altman method for repeated measurement was performed to evaluate inter-rater reliability of the analysts for all regions. A representative Bland–Altman plot (Posterior Intervertebral Disc Height [PIVDH]) is shown in Fig. 6. A summary of the Bland–Altman analyses are in Table 7. The main purpose of using these plots is to visualize the agreement between analysts. In these plots, the Y-axis represents the difference between researchers’ readings, and the X-axis represents the average differences between analyst readings. Bias represents the mean of the differences in all measurements. The lower and upper levels of agreement (LOA) represent the mean difference’s 95% confidence interval (± 1.96 SD).Fig. 6. Bland Altman Plot of Posterior Intervertebral Disc Height** (**PIVDH) for Inter-rater Reliablity of Analyst 1 and 2Table 7Mean Absolute Percentage Difference (MAPD) for Measurements of Vertebral BodiesDimensionSet of Scan Slices/AnalystL2L3L4L5S1OverallSVBWSame/Same^1^2.312.382.552.764.502.90Different/Same3.782.913.063.326.803.98Same/Different4.454.726.286.489.606.31Different/Different^2^4.894.755.826.629.416.30AVHSame/Same^1^3.082.582.682.642.832.76Different/Same3.683.103.293.484.213.55Same/Different4.233.383.844.104.824.07Different/Different^2^4.693.413.653.974.544.05PVHSame/Same^1^2.482.502.392.723.232.66Different/Same3.313.042.893.484.633.47Same/Different4.564.154.585.136.875.06Different/Different^2^4.354.044.094.345.854.53SVBLSame/Same^1^2.242.402.052.213.412.46Different/Same3.822.712.623.124.513.36Same/Different4.734.165.195.915.735.14Different/Different^2^5.744.304.444.985.545.00IVBLSame/Same^1^2.052.102.513.066.923.33Different/Same3.342.853.163.907.904.23Same/Different4.324.865.077.0714.257.11Different/Different^2^4.474.204.356.4514.236.74SVBHSame/Same^1^2.532.392.232.242.792.44Different/Same3.423.482.913.223.573.32Same/Different4.764.783.533.725.784.51Different/Different^2^4.934.673.453.636.084.55^1^Expected “best case” (same set of scan slices analyzed by the same analyst)^2^Expected “worst case” (different set of scan slices analyzed by different analyst)
The lowest mean differences were observed in the Anterior Vertebrae Height (AVH) and the Sagittal Vertebrae Body Height (SVBH) by − 0.03 cm. The highest mean differences were observed in the muscle groups (Psoas R, Psoas L, ESR, ESL) and the Disc Size. The highest mean differences are − 0.30, − 0.43, 1.99, 1.59, and − 0.33 (cm^2^), respectively (shown in Table 7). A total of 14 measurements were made.
Positive correlations were exhibited between researchers for all measurements (Table 7). Some of these relationships were stronger than others. As complexity of tracing increases, so does the variability. Analysts have closer agreement in vertebral body measurements where the bony landmarks are highly visible. On the other hand, IVD and muscle CSA tracing is much more difficult. These, less well-defined structures were more difficult to measure and were more impacted by subject movement artifacts during scanning.
Discussion
Morphometry of the trunk muscle CSAs and vertebral bodies have been used for various purposes such as inputs to biomechanical models or to assist in medical diagnoses. To increase trust in MRI data used for these purposes, the procedures and methodologies used to collect that data should be evaluated.
A scan-rescan analysis using repeated scans with short inter-scan time intervals facilitates the assessment of the reliability of the acquired imaging data and increases confidence in the consistency of results [17]. It has been shown that scan results may sometimes show differences when processed with different techniques. Some studies [18] have argued that repeated MR scanning of the same subject, even if using the same analyst and acquisition parameters, does not result in identical representations due to small changes in subject/image orientation, changes in pre-scan parameters, and magnetic field instability. Morey et al. [18] also stated that these differences might lead to appreciable changes in volume estimates for different structures. The accuracy and repeatability of the measurement techniques themselves, however, have not typically been reported in detail. For example, some medical studies focused on the variability of the diagnosis using MRI scans rather than on the reliabilty of the scans themselves. For example, Herzog et al. [19] had a single subject visit 10 different MRI centers for a diagnosis. While they tested the reliability of diagnoses performed by each center’s medical professionals, they did not compare the actual scans themselves. In the present study, the number of subjects was large enough to explore the repeatability of the measurement process itself and to provide accurate information regarding geometric dimensions of both vertebral structures and muscle CSAs.
To our knowledge, AVH and PVH has been studied more often than the other vertebral measurements. In the current study, these parts have shown excellent ICC values for intra-rater (ICC = 0.918, 0.924) and good inter-rater (ICC = 0.845, 0.806) reliability for AVH and PVH, respectively. Hong et al. [20] evaluated AVH to PVH ratio measured by three observers and reported 0.753 for intra-rater and 0.793 for inter-rater agreement. In another study Yao et al. [21] reported excellent reliability coefficients (0.90–0.99) for AVH, PVH, SVBH, AIVDH, and PIVDH. Data from 40 vertebral bodies and 32 intervertebral discs were obtained from 8 cadavers and measured by three observers in their study. Tang [8] reported a higher ICC for intra- (0.990–0.996) and inter-rater (0.971) reliability of disc size. Data from 40 subjecsts were analyzed by two observers with a one-month interval between measurements.
Muscle CSAs showed slightly higher ICCs, however; they also have higher MAPDs. Higher MAPDs may have resulted from breathing artifacts during the MRI process. Valentin et al. [22] reported slightly higher inter-rater agreement for Psoas muscles (0.94, 0.92; left and right side respectively) and for Erector Spinae muscles (0.96, 0.93; left and right side respectively) based on the data from 10 male subjects collected between L1-L5 level. However, the authors did not report any data for intra-rater agreement.
All parameters that were estimated (e.g., cross-sectional areas, vertebral bodies etc.) demonstrated agreement (positive correlations) between analysts. The results indicated that there is always a positive trend (relationship) regardless of observation type (e.g., “best” [same analyst and same scan] case or “worst" [different analysts and different scans] case). Having a different analyst appeared to make a bigger difference than having a different scan, but agreements were good in all conditions. For example, when analyzing the same scan, analysts chose the identical MRI slice (among 50 slices) on which to perform their measurements 61% of the time and within one slice (3 mm) 90% of the time and were never different by more than 2 slices (6 mm). In other studies, the same slice is typically provided to the analysts, however, the present study demonstrates that, when free to choose, analysts will tend to converge on the same slice anyway [23]. This suggests that previous studies utilizing identical slices should only overestimate ICC slightly, further suggesting that MRI is a repeatable means for estimating structure dimensions.
Tracing muscle CSA is more difficult than vertebral bony structures due to less distinct structural boundaries. There were no significant differences between reading one’s scan and reading someone else’s (see Table 3).
Overall, scan/rescan agreement was very high, including “worst-case” scenarios comparing scans read and performed by different analysts. The results suggest that MRI-derived measurements are very consistent and repeatable.
