# 3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer

**Authors:** Xiang Gao, Hui Tian, Xuefei Yin, Alan Wee-Chung Liew

PMC · DOI: 10.3390/s26031068 · Sensors (Basel, Switzerland) · 2026-02-06

## TL;DR

This paper introduces a new 3D segmentation method for white matter tracts in brain scans, using a hybrid transformer network to improve accuracy and efficiency.

## Contribution

The paper proposes a novel 3D segmentation approach combining convolution and transformer blocks with a channel-wise transformer and symmetric training for better tract segmentation.

## Key findings

- The proposed 3D TractFormer outperforms state-of-the-art methods in white matter tract segmentation.
- The hybrid architecture effectively integrates spatial and global contextual features for enhanced performance.
- The method reduces memory and computational costs while handling 4D dMRI data complexity.

## Abstract

Segmenting white matter tracts in diffusion-weighted magnetic resonance imaging (dMRI) is of vital importance for brain health analysis. It remains a challenging task due to the intersection and overlap of tracts (i.e., multiple tracts coexist in one voxel) and the data complexity of dMRI images (e.g., 4D high spatial resolution). Existing methods that demonstrate good performance implement direct volumetric tract segmentation by performing on individual 2D slices. However, this ignores 3D contextual information, requires additional post-processing, and struggles with the boundary handling of 3D volumes. Therefore, in this paper, we propose an efficient 3D direct volumetric segmentation method for segmenting white matter tracts. It has three key innovations. First, we propose to deeply interleave convolutions and transformer blocks into a U-shaped network, which effectively integrates their respective strengths to extract spatial contextual features and global long-distance dependencies for enhanced feature extraction. Second, we propose a novel channel-wise transformer, which integrates depth-wise separable convolution and compressed contextual feature-based channel-wise attention, effectively addressing the memory and computational challenges of 4D computing. Moreover, it helps to model global dependencies of contextual features and ensures each hierarchical layer focuses on complementary features. Third, we propose to train a fully symmetric network with gradually sized volumetric patches, which can solve the challenge of few 3D training samples and further reduce memory and computational costs. Experimental results on the largest publicly available tract-specific tractograms dataset demonstrate the superiority of the proposed method over the current state-of-the-art methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12900097/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12900097/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12900097/full.md

---
Source: https://tomesphere.com/paper/PMC12900097