# Variability in Volumetric Measures Between Different Versions of FreeSurfer

**Authors:** Jacqueline Rizzo, Hope Shimony, Charles D. Chen, Sarah J. Keefe, Kristine E. Shady, Rebecca L. Feldman, Jalen Scott, Thomas Hunter Smith, Kaitlyn Dombrowski, Ashlee Simmons, John C. Morris, David M. Holtzman, Brian A. Gordon, Tammie L.S. Benzinger, Shaney Flores

PMC · DOI: 10.1002/alz70856_106081 · Alzheimer's & Dementia · 2026-01-08

## TL;DR

The study finds significant differences in brain volume measurements between different versions of the FreeSurfer software, which could affect other imaging analyses.

## Contribution

The paper quantifies variability in volumetric measures between FreeSurfer versions, highlighting regions with the largest discrepancies.

## Key findings

- Test-retest variability showed significant intercept differences for two T1 scans.
- Inter-version intercepts for cortical and subcortical volumes were more variable than test-retest offsets.
- Regions like cerebellum white matter and parahippocampal showed the largest inter-version differences.

## Abstract

FreeSurfer is a widely‐used program for segmenting cortical and subcortical regions of interest (ROIs) from magnetic resonance imaging (MRI) scans. Regional volumetric measures can be calculated from these ROIs and other imaging modalities may use the ROI segmentations for quantification. FreeSurfer has recently undergone several updates to improve performance. Here, we compare volumetric measures across different FreeSurfer versions.

Two T1‐weighted structural head MRI scans were acquired within the same session for 18 cognitively unimpaired participants (mean age: 63.7 years, ages: 48‐76 years, 14 females) with a Clinical Dementia RatingÒ of 0 on a Siemens 3T Trio MRI scanner. To examine test‐retest variability, slopes and intercepts were extracted from general linear models for regional volumetric measures after processing both scans through a Dockerized FreeSurfer‐5.3 container. For exploring inter‐version variability, the first T1 scan was processed through a Dockerized FreeSurfer‐7.4 container. Slopes and intercepts were extracted from general linear models predicting regional FreeSurfer‐7.4 measures from the associated 5.3 measures from the first T1 scan. Intercepts were assessed to determine if there was a significant offset in our test‐retest group and between FreeSurfer versions.

Volumetric measures for our test‐retest group showed significantly variable intercepts for two separate T1 scans. While our inter‐version comparison showed similarly strong linear association, inter‐version intercepts that removed the test‐retest offset were still significantly more variable for cortical and subcortical volumes (Figure 1). Some of the largest inter‐version intercept differences were observed for cerebellum white matter (left: t=3.77, p <.001; right: t=3.18, p <.001), parahippocampal (left: t=3.20, p <.001; right: t=4.78, p <.001), and right inferior parietal (t=2.50, p = .002) volumes (Figure 2 and Figure 3).

Volumetric measures varied significantly between FreeSurfer‐5.3 and 7. Regions with the largest differences, such as cerebellum cortex, should be investigated for their impact on other imaging modalities, such as positron emission tomography, that use those regions for quantification.

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12782150/full.md

---
Source: https://tomesphere.com/paper/PMC12782150