Automated 3D segmentation of human vagus nerve fascicles and epineurium from micro-computed tomography images using anatomy-aware neural networks

Jichu Zhang; Maryse Lapierre-Landry; Havisha Kalpatthi; Michael W Jenkins; David L Wilson; Nicole A Pelot; Andrew J Shoffstall

PMC · DOI:10.1088/1741-2552/ae33f6·January 20, 2026

Automated 3D segmentation of human vagus nerve fascicles and epineurium from micro-computed tomography images using anatomy-aware neural networks

Jichu Zhang, Maryse Lapierre-Landry, Havisha Kalpatthi, Michael W Jenkins, David L Wilson, Nicole A Pelot, Andrew J Shoffstall

PDF

Open Access

TL;DR

This paper presents a deep learning method for accurately segmenting human vagus nerve structures in 3D microCT images, improving the precision of nerve morphology analysis for peripheral nerve stimulation therapies.

Contribution

A novel anatomy-aware 3D U-Net with a custom loss function for accurate and efficient segmentation of vagus nerve fascicles and epineurium in microCT images.

Findings

01

The 3D U-Net achieved an average Dice similarity coefficient of 0.93 for segmentation.

02

The 3D approach reduced anatomical errors by 2.5-fold and improved fascicle split/merge detection by nearly 6-fold.

03

The method provides high-throughput, anatomically accurate 3D maps of peripheral nerve morphology.

Abstract

Objective. Precise segmentation and quantification of nerve morphology from imaging data are critical for designing effective and selective peripheral nerve stimulation (PNS) therapies. However, prior studies on nerve morphology segmentation suffer from important limitations in both accuracy and efficiency. This study introduces a deep learning approach for robust and automated three-dimensional (3D) segmentation of human vagus nerve fascicles and epineurium from high-resolution micro-computed tomography (microCT) images. Methods. We developed a multi-class 3D U-Net to segment fascicles and epineurium that incorporates a novel anatomy-aware loss function to ensure that predictions respect nerve topology. We trained and tested the network using subject-level five-fold cross-validation with 100 microCT volumes (11.4 μm isotropic resolution) from cervical and thoracic vagus nerves stained…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens

Chemicals1

phosphotungstic acid

Figures6

Click any figure to enlarge with its caption.

Ground truth annotation, cross-validation strategy, and 3D U-Net architecture. (a) Example of a raw microCT volume and corresponding manual segmentation of fascicles and epineurium. Scale bar is 500 μm. (b) The five-fold leave-one-out cross-validation approach used for training and evaluation. Each fold used four subjects (80 images) for training and one subject (20 images) for testing. (c) Multi-class 3D U-Net architecture for fascicle and epineurium segmentation. Conv, convolution; IN, instance normalization; ReLU, rectified linear unit.

Overview of the anatomy-aware network training and compound loss computation. (a) Identification of voxels in the automated segmentation that do not meet anatomical constraints, i.e. fascicles must be completely enclosed by epineurium and fascicles cannot directly contact background. The anatomy-aware module identifies regions in the 3D U-Net prediction where anatomical constraints are violated and generates a mask of critical voxels V that highlights areas requiring focused supervision during training. (b) Computation of the compound loss function. The network is trained using three loss terms: cross-entropy loss (LCE) for pixel-wise accuracy, Dice loss (LDice) for region overlap, and topology loss (LTopo) for a priori anatomical knowledge. The topology loss is computed by applying cross-entropy loss only to regions identified in the critical voxels map V. ⊙ denotes the Hadamard product. Scale bars are 500 μm.

3D U-Net yields more accurate automated segmentations of vagus nerve fascicles and epineurium from microCT compared to a 2D U-Net. (a) Representative results comparing 3D and 2D U-Nets for three nerve samples ranked by 3D U-Net Dice similarity coefficient (DSC) at (i) high (90th percentile), (ii) medium (50th percentile), and (iii) low (20th percentile) performance. Images show 3D rendered volumes (top row of each subpanel) and corresponding middle cross sections (bottom row of each subpanel) of raw microCT, ground truth (GT), and predictions from 3D and 2D U-Nets. Scale bars are 1 mm. (b)–(d) Segmentation metrics for each class (fascicles or epineurium) across five cross-validation folds (N = 100 images total). Plots show mean values with 95% confidence intervals; individual fold means are shown as gray dots connected by dashed lines. Performance metrics include: (b) DSC, (c) surface DSC (single-pixel tolerance), and (d) average symmetric surface distance (ASSD, in μm). ↑+, higher is better; ↓+, lower is better. , p < 0.0001.

3D U-Net improves detection of individual fascicles in nerve cross sections. (a) MicroCT cross sections and fascicle segmentations by 3D and 2D U-Nets for two nerve samples ranked by 3D U-Net’s fascicle F1 score at (i) high (90th percentile) and (ii) low (20th percentile) performance. Fascicle F1 score (at intersection-over-union [IoU] threshold = 0.7) and Dice similarity coefficient (DSC) are labeled. Scale bars are 1 mm. Segmented fascicles (yellow) are overlaid with ground truth fascicles (GT, white dashed lines) with per-fascicle IoU values (omitted for false positives/negatives). (b) Mean fascicle F1 score with 95% confidence interval (CI) at various IoU thresholds for 3D and 2D U-Nets. Higher IoU means stricter overlap criteria for matching. (c) Comparison of per-fascicle over-segmentation (left) and under-segmentation (right) rates. (d) Fascicle size classification into four categories (Buyukcelik et al 2023) (see distribution in figure S4 in supplementary note 5). deff, effective circular diameter. Scale bar is 1 mm. (e) Percentage of missed fascicles. In (c), (e), plots show mean values with 95% CIs; individual fold means are shown as gray dots connected by dashed lines. ↑+, higher is better; ↓+, lower is better. , p < 0.0001; *, p < 0.001; , p < 0.01; *, p < 0.05; ns, not significant.

3D U-Net preserves fascicle connectivity and reduces anatomical violations. (a) Predicted fascicle skeletons (red) vs. ground truth (GT, gray) for two examples ranked by 3D U-Net centerline Dice (clDice) score at (i) high (90th percentile) and (ii) medium (50th percentile) performance. clDice and average Dice similarity coefficient (DSC) values are labeled. Quantitative comparisons of (b) fascicle connectivity measured by clDice scores and (c) anatomical error rates (%) between 3D and 2D U-Nets. Plots show mean values with 95% confidence intervals; individual fold means are shown as gray dots connected by dashed lines. (d) Visualization of errors violating expected nerve anatomy (red voxels) for two samples ranked by 3D U-Net error rate at (i) low (20th percentile) and (ii) high (90th percentile) violation levels. Images show cross-sectional and 3D views, illustrating typical errors like broken and discontinuous fascicles. Cross section locations are marked by yellow dashed boxes on the 3D volumes. Voxel error rates and average DSC of example images are labeled. Scale bars are 1 mm. ↑+, higher is better; ↓+, lower is better. , p < 0.0001.

3D U-Net improves accuracy of longitudinal fascicular structure, including inter-slice jitter and split/merge events. (a) Boundary F1 (BF score) comparison for fascicles (left) and epineurium (right) across 3D/2D predictions and ground truth (GT). Higher scores indicate better boundary consistency between consecutive cross sections. (b) Fascicle split/merge event deviation from GT (%), measured as percentage difference between predicted and GT event frequency (number of splits and merges per millimeter). In (a), (b), plots show mean values with 95% confidence intervals; individual fold means are shown as gray dots connected by dashed lines. ↑+, higher is better; ↓+, lower is better. , p < 0.0001. (c)–(e) Example nerve sample showing: (c) 3D renderings of fascicle segmentations from 3D U-Net (blue), 2D U-Net (orange), and GT (gray); dashed arrow marks location of origins in (d), (e). Scale bars are 1 mm. (d) BF score profiles along the example nerve sample in panel (c). (e) Distribution of fascicle merge (red) and split (blue) events for the example nerve sample in panel (c).

Funding4

—Case Western Reserve University10.13039/100008136
—National Institutes of Health10.13039/100000002
—Cleveland VA APT Center
—US Department of Veterans Affairs

Keywords

image segmentationconvolutional neural networkmicro-computed tomographyvagus nerve

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVagus Nerve Stimulation Research · Neuroscience and Neural Engineering · Nerve injury and regeneration

Full text

Introduction

Peripheral nerve stimulation (PNS) targeting somatic nerves enables the restoration of movement and sensation following spinal cord injury or limb loss (Taghlabi et al 2024). PNS targeting autonomic nerves enables treatment of a wide variety of diseases, such as sacral nerve stimulation to restore bladder and bowel function (Assmann et al 2020) and vagus nerve stimulation (VNS) for numerous indications (Johnson and Wilson 2018, Pavlov and Tracey 2022). The vagus nerve provides extensive sensory and motor innervation between the brainstem and the viscera and critically regulates physiological functions and homeostasis (Neuhuber and Berthoud 2021). Implanted VNS is FDA-approved to treat epilepsy (Ben-Menachem et al 1994), depression (Sackeim et al 2001), obesity (Ikramuddin et al 2014), and stroke sequelae (Dawson et al 2021). Ongoing studies are investigating VNS for treating heart failure (Konstam et al 2019), chronic pain (Shao et al 2023), and inflammatory disorders (Bonaz et al 2021). Despite this broad clinical potential, the efficacy of current VNS approaches is limited by their non-selective stimulation of the entire vagus nerve trunk (Fitchett et al 2021). This lack of selectivity frequently leads to off-target stimulation and adverse effects—including coughing, throat pain, hoarseness, and dyspnea—which compromise clinical outcomes (Ben-Menachem 2001). Consequently, VNS techniques with improved selectivity are needed to precisely target desired pathways while avoiding those that cause side effects.

Developing selective PNS therapies remains challenging due to an incomplete understanding of the targeted neuroanatomy (Fitchett et al 2021). Micro-computed tomography (microCT) is an ex vivo nerve imaging technique that can resolve the three-dimensional (3D) structure of fascicles (bundles of fibers) at high resolution (∼5–40 μm voxel spacing) with large fields of view (∼10 cm) (Thompson et al 2020, Upadhye et al 2022, Jayaprakash et al 2023). Recent advances in nerve staining for microCT have further improved fascicular contrast and maintained compatibility with histology (Upadhye et al 2025a). Thus, microCT is a key imaging modality for characterizing nerve morphology.

To inform the development of PNS therapies, microCT images must be segmented to extract the nerve (epineurium) and fascicle boundaries. The segmentations enable quantification of the neuroanatomy, including nerve and fascicle diameters (Pelot et al 2020) and morphological changes due to splits and merges of fascicles (Upadhye et al 2022). They also enable the analysis of the nerve’s functional organization (Jayaprakash et al 2023, Thompson et al 2023). These insights are key to identifying novel points of intervention and designing function- or organ-specific neuromodulation. The segmentations can also serve as inputs for anatomically realistic computational models of PNS. By predicting nerve fiber responses to stimulation, computational models inform the design of electrode geometry, electrode placement, and stimulation parameters to achieve targeted neural responses (Wongsarnpigoon and Grill 2010, Butson et al 2011, Kent and Grill 2013, Aristovich et al 2021, Ciotti et al 2024, Tebcherani et al 2024, Musselman et al 2025). Therefore, segmentation of nerve morphology is fundamental to quantitative mapping of vagal pathways and drives the design of novel PNS therapies.

Accurate segmentation of nerve morphology is essential, but existing methodologies are limited by throughput and accuracy. Manual or semi-automated segmentation is commonly applied to histological (Verlinden et al 2016, Pelot et al 2020, Settell et al 2020) and microCT (Kronsteiner et al 2022, Thompson et al 2023) images but is labor-intensive. Nerve morphology varies substantially between individuals (Brill and Tyler 2017, Pelot et al 2020, Upadhye et al 2022); therefore, large-scale analyses are warranted with accompanying automated segmentation techniques for high throughput, such as convolutional neural networks (CNNs) (Rizwan I Haque and Neubert 2020). Recent studies have used two-dimensional (2D) CNNs for segmenting vagus nerve fascicles in both microCT (Buyukcelik et al 2023, Jayaprakash et al 2023) and histological (Verardo et al 2025) images. However, by processing data slice-by-slice, these networks have limited spatial context and may risk creating inaccurate representations of the highly plexiform 3D vagal morphology (Stewart 2003, Upadhye et al 2022). Further, prior CNN-based microCT studies treated nerve segmentation as a per-pixel classification task, using generic metrics such as Dice score which also overlook the expected anatomical structures (Stewart 2003). Therefore, the use of existing 2D CNNs to segment microCT images of nerves is unlikely to achieve the segmentation quality demanded by downstream applications without extensive manual refinements. Important applications include morphological measurements (Grinberg et al 2008, Schiefer et al 2012, Brill and Tyler 2017, Pelot et al 2020), fascicle tracking (Sunderland 1945, Upadhye et al 2022, Jayaprakash et al 2023, Thompson et al 2023), and computational modeling (Brill and Tyler 2011, Schiefer et al 2012, Musselman et al 2023). Together, these limitations highlight the need for a new automated segmentation approach that is designed and validated based on its ability to create accurate, functionally useful 3D anatomical maps of nerve morphology to inform PNS development.

In this study, we present an enhanced deep learning approach for accurate segmentation of fascicles and epineurium from microCT images of cadaveric human vagus nerves. Our approach features a 3D U-Net CNN that leverages the volumetric context inherent in high-resolution microCT data. We guided network training with a novel anatomy-aware loss function that incorporates structural constraints of the nerve. We benchmarked the segmentation accuracy of our 3D U-Net against a published 2D U-Net (Buyukcelik et al 2023) using five-fold cross-validation with both standard and anatomy-aware metrics. Our 3D approach achieved significantly better spatial overlap, boundary delineation, and detection of individual fascicles. The resulting 3D segmentations also showed improved preservation of fascicle connectivity, fewer nerve morphological errors, and more consistent boundaries across slices. By overcoming key limitations of prior techniques, our segmentation pipeline provides a high-throughput tool for generating realistic 3D anatomical maps of the vagus nerve and is adaptable to other PNS targets. These improved segmentations facilitate analyses that can advance our understanding of vagal neuroanatomy and accelerate the development of next-generation VNS therapies.

Methods

This study is part of the Reconstructing Vagal Anatomy multimodal vagus nerve mapping project (Pelot et al 2025, Zhang et al 2025).

All analyses were conducted in R (version 4.3.3) with rstatix (Kassambara 2023) for statistical tests and ggplot2 (Wickham 2016) for plotting. Image processing, model training, and evaluation were conducted using Python 3.10 and PyTorch 2.1.2 on an NVIDIA RTX A6000 GPU (48 GB memory; Santa Clara, CA) with CUDA 12.1 using 16-bit automatic mixed precision.

Tissue collection and preparation

2.1.

We collected human vagus nerves from five embalmed cadavers (2 female, 3 male; 57–81 years) donated to the Anatomical Gift Program at Case Western Reserve University (table S1). This study received non-human subject determination from the Case Western Reserve University Institutional Review Board because the work involved only deceased individuals and did not include identifiable private information, which does not meet the U.S. federal definition of human subjects research.

We dissected the vagus nerve bilaterally from the inferior border of the jugular foramen to the superior end of the esophageal plexus. We mounted the excised nerves on custom acrylic boards (3 cm wide, up to 9 cm long); the nerve tissue on each board was termed a ‘sample’. We stained nerve samples with 3% phosphotungstic acid (PTA; HT152-250ML, Sigma-Aldrich, St. Louis, MO) for 24 h and gently agitated at 50 rpm using an orbital shaker (SI-M1500, Scientific Industries, Bohemia, NY) (Upadhye et al 2025a, 2025b). After staining, we covered the samples with gauze saturated with 1X phosphate-buffered saline (PBS; BP399-1, Fisher Scientific, Hampton, NH) and stored them in sealed containers at 4 °C for 0.5–4 d.

MicroCT image acquisition

2.2.

We loaded the nerve samples mounted on their acrylic boards into a cylindrical sample tube (34 mm diameter × 110 mm height, U50825, Scanco Medical AG, Brüttisellen, Switzerland). We performed microCT scanning using a μCT 100 cabinet scanner (Scanco Medical AG) with the following parameters: x-ray voltage 55 kV, current 145 mA, integration time 500 ms, and 0.5 mm aluminum filter. These parameters were selected to balance image quality with scan duration, reducing the risk of tissue dehydration (Upadhye et al 2025a). We scanned all samples with a circular field of view of 35.2 mm diameter (9.73 cm^2^ area) at 11.4 μm isotropic voxel spacing. Scanning each tube (containing two nerve samples) required ∼14 h. We reconstructed scans using the manufacturer’s proprietary software and exported as series of 16-bit DICOM images (3072 × 3072 pixels). For more efficient file storage and analysis, we compiled the DICOM image data for each sample into 3D OME-Zarr arrays (Moore et al 2021) at the original resolution.

Ground truth annotation

2.3.

We created a dataset with 100 non-overlapping microCT volumes evenly distributed across the five cadavers (20 volumes per cadaver). Each volume was 64 × 1536 × 3072 voxels (z, y, x) in size with an isotropic voxel spacing of 11.4 μm; the x and y dimensions captured the full nerve cross section in all slices, and the z dimension spanned 0.73 mm per volume. We selected these volumes from representative cervical and thoracic regions containing fascicle splits and merges, and we avoided areas with excessive tissue damage. We exported each volume at the original resolution as a 16-bit TIFF stack for ground truth annotation.

We manually segmented fascicle and epineurium boundaries from 100 microCT images from five cadaveric subjects to generate ground truth annotations for training the 3D U-Net (figure 1(a)). We manually traced fascicle boundaries every 10 cross sections in a volume using 3D Slicer (version 5.6.1) (Fedorov et al 2012). We interpolated between manually segmented slices using the ‘Fill between slices’ function. We reviewed and edited the segmentations at each fascicle split or merge.

Ground truth annotation, cross-validation strategy, and 3D U-Net architecture. (a) Example of a raw microCT volume and corresponding manual segmentation of fascicles and epineurium. Scale bar is 500 μm. (b) The five-fold leave-one-out cross-validation approach used for training and evaluation. Each fold used four subjects (80 images) for training and one subject (20 images) for testing. (c) Multi-class 3D U-Net architecture for fascicle and epineurium segmentation. Conv, convolution; IN, instance normalization; ReLU, rectified linear unit.

We segmented the epineurium semi-automatically using the Segment Anything for Microscopy (µSAM) package (Archit et al 2025). We first applied the built-in vision transformer base model to interactively segment the epineurium every 16 cross sections in a volume followed by manual corrections. Then, we used the ‘Automatic tracking’ function to propagate these ‘seed’ masks across the entire image stack with the following parameters: intersection-over-union (IoU) threshold = 0.5, motion smoothing factor = 0.5, bounding box extension = 0.05.

We combined the fascicle and epineurium segmentations into an 8-bit TIFF stack at the original image resolution. Each voxel was classified as fascicle, epineurium, or background; fascicle labels overwrote epineurium in cases of overlap. We validated each segmentation against the raw microCT volume, verifying boundary accuracy, fascicle splits and merges, and structural continuity across slices.

Network and loss function

2.4.

Network architecture

2.4.1.

Our neural network is based on published 3D U-Net architectures (Ronneberger et al 2015, Çiçek et al 2016) and was implemented using the nnU-Net framework (Isensee et al 2021), which automatically configured the architecture and hyperparameters based on properties of the input dataset. The network processed single-channel grayscale microCT volumes and output three-class segmentation maps wherein each voxel was identified as background, fascicle, or epineurium.

The network featured a 7-stage encoder-decoder with progressive feature dimensions of 32, 64, 128, 256, 320, 320, and 320 (figure 1(c)). Each stage had two 3 × 3 × 3 convolution layers with varying downsampling strides: (1,1,1) for stage 1, (2,2,2) for stages 2–5, and (1,2,2) for stages 6 and 7. The decoder path mirrored the encoder structure with skip connections from corresponding stages, maintaining consistent kernel sizes and strides except at the bottleneck layer. We used instance normalization and leaky ReLU activation throughout the network without dropout.

Anatomy-aware loss function

2.4.2.

To ensure anatomically plausible segmentation of nerve structures, we adopted an anatomy-aware module based on the topological interaction framework (Gupta et al 2022). The module enforced two anatomical constraints: (1) fascicles must be completely enclosed by epineurium, and (2) fascicles cannot directly contact background voxels. Using 3D convolution operations based on 26-neighbor voxel connectivity, the module identified voxels where these anatomical rules were violated (figure 2(a)). The resulting critical voxel map V highlighted regions that required additional supervision during training. We integrated the critical voxel map into the training by applying a loss penalty specifically to the identified violation regions, allowing the network to learn anatomically consistent representations through end-to-end optimization.

Overview of the anatomy-aware network training and compound loss computation. (a) Identification of voxels in the automated segmentation that do not meet anatomical constraints, i.e. fascicles must be completely enclosed by epineurium and fascicles cannot directly contact background. The anatomy-aware module identifies regions in the 3D U-Net prediction where anatomical constraints are violated and generates a mask of critical voxels V that highlights areas requiring focused supervision during training. (b) Computation of the compound loss function. The network is trained using three loss terms: cross-entropy loss (LCE) for pixel-wise accuracy, Dice loss (LDice) for region overlap, and topology loss (LTopo) for a priori anatomical knowledge. The topology loss is computed by applying cross-entropy loss only to regions identified in the critical voxels map V. ⊙ denotes the Hadamard product. Scale bars are 500 μm.

We incorporated anatomical constraints into the network training through a compound loss function (figure 2(b)) with three components: cross-entropy loss (LCE) for pixel-wise classification accuracy, Dice loss (LDice) for optimizing region overlap, and topology loss (LTopo) for enforcing anatomical constraints. The topology loss was computed by applying cross-entropy loss specifically to regions identified in the critical voxels map V:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{L_{{\mathrm{Topo}}}} = {\mathrm{CE}}\left( {p \odot V,g \odot V} \right)\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p $\end{document}$ is the predicted segmentation, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ g $\end{document}$ is the ground truth, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \odot $\end{document}$ is the Hadamard product. The final loss function combines all components:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}L = {L_{{\mathrm{CE}}}} + {L_{{\mathrm{Dice}}}} + \lambda {L_{{\mathrm{Topo}}}}\end{equation*}\end{document}

with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \lambda = 1 \times {10^6} $\end{document}$ to weight the topological component of the 3D prediction (Gupta et al 2022). This compound loss function ensured both global segmentation quality and appropriate anatomical relationships between neural structures.

Network training and inference

2.5.

We implemented a five-fold leave-one-out cross-validation strategy (figure 1(b)). The dataset of 100 annotated microCT volumes was divided at the subject level to prevent data leakage. In each iteration, we used 80 volumes from four subjects for training and 20 from the remaining subject were used for testing. This subject-wise partitioning allowed us to evaluate the network’s generalized performance across different anatomical variations and quality of tissue, staining, and imaging.

Our 3D U-Net was trained using patches of size 32 × 256 × 256 voxels randomly sampled from the full-resolution volumes (0.36 × 2.92 × 2.92 mm^3^). We ensured that 33% of the training patches contained at least one foreground class (i.e., fascicle or epineurium). Before training, all input images were normalized by clipping the original 16-bit intensities to [0,32 767] and rescaling to [0,1]. The clipping removed negative intensities, which were reconstruction artifacts that occurred outside of the tissue and contained no anatomical information. To prevent overfitting, we applied an extensive set of data augmentation techniques during training, including random rotation, random Gaussian noise, and contrast adjustments; see figure S1 (supplementary note 2) for example augmentations and table S2 (supplementary note 2) for the complete list of data augmentations with parameter ranges and probabilities.

We trained the 3D U-Net for 500 epochs with a batch size of 4 patches, where each epoch consisted of 250 mini-batch iterations (see figure S2 in supplementary note 2 for example training curves). We used stochastic gradient descent with Nesterov momentum ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \mu = 0.99 $\end{document}$ ) as the optimizer. The initial learning rate was set to 0.01 and followed a polynomial decay schedule, where the learning rate at each epoch was reduced by multiplying it by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\left( {1 - e/{e_{{\mathrm{max}}}}} \right)^{0.9}} $\end{document}$ . Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ e $\end{document}$ represented the current epoch number and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {e_{{\mathrm{max}}}} $\end{document}$ was the total number of epochs. After each training epoch, we performed in-training validation on 50 patches randomly extracted from the validation set rather than full-size images, allowing for efficient training without the computational overhead of full image inference.

During full image inference, we extracted patches with 25% overlap in all dimensions. We merged the overlapping patches using a sliding window approach with Gaussian blending (σ = 0.125) weighted by patch centers.

3D U-Net vs. 2D U-Net Comparison

2.6.

We compared the performance of our 3D U-Net to a published 2D U-Net that was also implemented to segment the fascicular morphology of human vagus nerves from microCT images (Buyukcelik et al 2023) (figure S3). We optimized the 2D U-Net’s hyperparameters and trained it on the same dataset using identical cross-validation splits, preprocessing steps, and data augmentation techniques. For the 2D U-Net, we used the Adam optimizer with a constant learning rate of 0.0001 and a batch size of 8 patches of size 512 × 512 pixels. The sampling strategy ensured that 67% of the patches contained at least one foreground class (i.e., fascicle or epineurium). We performed a grid hyperparameter search to determine the optimal learning rate, batch size, and patch size. For in-training validation, an 80/20 split was performed at the volume level (instead of slice-wise) to prevent data leakage from similar, adjacent cross sections.

During inference, the 2D U-Net processed raw microCT volumes slice-by-slice with 512 × 512 patches extracted with 25% overlap in both dimensions. We merged the patches using the same sliding window approach as the 3D U-Net. We stacked the slice-wise predictions into 3D volumes for evaluation and visualization.

We compared the networks using paired, two-sided Wilcoxon signed-rank tests (α = 0.05) for each metric. We reported 95% confidence intervals derived from bootstrap resampling (n = 1000) and p-values with a Bonferroni correction for multiple comparisons.

Ablation study

2.7.

To evaluate the contribution of the anatomy-aware component of the loss function, we performed an ablation study by training an identical 3D U-Net architecture with only the conventional loss terms—Dice loss (LDice) and cross-entropy loss (LCE). The same training protocol, data splits, augmentation techniques, and optimization hyperparameters were maintained for both 3D U-Nets.

Evaluation metrics

2.8.

We evaluated the performance of the U-Nets using a set of quantitative metrics. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ P $\end{document}$ denotes the predicted segmentation (output from the network), and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ G $\end{document}$ denotes the ground truth segmentation (manual annotations). For each test image, all metrics were calculated separately for fascicle and epineurium classes (where applicable).

Segmentation accuracy

2.8.1.

Dice similarity coefficient

2.8.1.1.

The Dice similarity coefficient (DSC) (Dice 1945) measures the volumetric overlap between the predicted and ground truth segmentations, with values ranging from 0 (no overlap) to 1 (perfect overlap):

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\mathrm{DSC}} = \frac{{2\left| {P\mathop \cap \nolimits G} \right|}}{{\left| P \right| + \left| G \right|}}\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \left| {P\mathop \cap \nolimits G} \right| $\end{document}$ represents the number of true positive voxels, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \left| P \right| $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \left| G \right| $\end{document}$ represent the total number of voxels in the predicted and ground truth segmentations, respectively.

Surface Dice similarity coefficient

2.8.1.2.

The surface DSC (Nikolov et al 2021) quantifies boundary accuracy by measuring the overlap between predicted and ground truth surfaces within a specified tolerance, with values ranging from 0 (no surface overlap) to 1 (perfect surface overlap). For a solid label with surface $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ S\left( L \right) $\end{document}$ , its border region $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B $\end{document}$ is defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}B\left( {L,\tau } \right) = \left\{ {x \in {\mathbb{R}^3}\mid \exists y \in S\left( L \right),d\left( {x,y} \right) \unicode{x2A7D} \tau } \right\}\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ d\left( \cdot \right) $\end{document}$ denotes the Euclidean distance and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \tau $\end{document}$ is a distance tolerance, which we set to one voxel. The surface DSC is then calculated as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\mathrm{Surface}}\;{\text{DSC = }}\frac{{\left| {B\left( {P,\tau } \right) \cap S\left( G \right)} \right| + \left| {B\left( {G,\tau } \right) \cap S\left( P \right)} \right|}}{{\left| {S\left( P \right)} \right| + \left| {S\left( G \right)} \right|}}\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {S\left( P \right)} $\end{document}$ is the surface of the predicted segmentation, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ S\left( G \right) $\end{document}$ is the surface of the ground truth segmentation, and the intersection terms represent the portions of each surface that fall within the border region of the other.

Average symmetric surface distance (ASSD)

2.8.1.3.

The ASSD (Heimann et al 2009) measures the average physical distance between the surfaces of the predicted and ground truth segmentations, indicating the magnitude of segmentation errors. A lower ASSD value indicates better agreement between the predicted and ground truth surfaces:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{align*}{\mathrm{ASSD}} &amp;= \frac{1}{{\left| {S\left( G \right)} \right| + \left| {S\left( P \right)} \right|}}\left( \mathop \sum \limits_{\mathrm{G} \in S\left( G \right)} \mathop {{\mathrm{min}}}\limits_{\mathrm{P} \in S\left( P \right)} d\left( {g,p} \right)\right.\\ &amp;\quad\left.+ \mathop \sum \limits_{\mathrm{P} \in S\left( P \right)} \mathop {{\mathrm{min}}}\limits_{\mathrm{G} \in S\left( G \right)} d\left( {p,g} \right) \right)\end{align*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ d\left( \cdot \right) $\end{document}$ denotes the Euclidean distance, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ g $\end{document}$ is a point in ground truth segmentation, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p $\end{document}$ is a point in predicted segmentation.

Fascicle detection accuracy

2.8.2.

In addition to standard segmentation metrics, we evaluated the network’s ability to correctly identify fascicle instances in cross sections. We defined binary masks of fascicles ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ F $\end{document}$ ) where voxels belonging to fascicle tissue were assigned a value of 1 and all other voxels were assigned a value of 0. We identified individual fascicle instances by applying an 8-connectivity connected components algorithm to the fascicle masks. We matched ground truth ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {F_{\mathrm{G},i}} $\end{document}$ ) and predicted ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {F_{\mathrm{P},j}} $\end{document}$ ) instances using IoU:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\mathrm{IoU}}\left( {{F_{\mathrm{G},i}},{F_{\mathrm{P},j}}} \right) = \frac{{\left| {{F_{\mathrm{G},i}}\mathop \cap \nolimits {F_{\mathrm{P},j}}} \right|}}{{\left| {{F_{\mathrm{G},i}}\mathop \cup \nolimits {F_{\mathrm{P},j}}} \right|}}.\end{equation*}\end{document}

For a given IoU threshold $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ t $\end{document}$ , we identified the matching pairs (i.e., true positives) of fascicle instances where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\mathrm{IoU}}\left( {{F_{\mathrm{G},i}},{F_{\mathrm{P},j}}} \right) \ge t $\end{document}$ . Based on the matches, we calculated the fascicle $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {F_1} $\end{document}$ score at threshold $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ t $\end{document}$ as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{F_1}\left( t \right) = \frac{{2 \cdot {\mathrm{TP}}\left( t \right)}}{{{\mathrm{TP}}\left( t \right) + {\mathrm{FP}}\left( t \right) + FN\left( t \right)}}\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\mathrm{TP}}\left( t \right) $\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\mathrm{FP}}\left( t \right) $\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\mathrm{FN}}\left( t \right) $\end{document}$ are the number of true positives, false positives, and false negatives at threshold $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ t $\end{document}$ , respectively.

We evaluated the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {F_1} $\end{document}$ score across IoU thresholds ranging from 0.50 to 0.95, with increments of 0.05. The $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {F_1} $\end{document}$ score balances precision and recall, where a prediction is considered a true positive if its IoU with a ground truth fascicle exceeds the threshold. Using a threshold of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ t \ge 0.5 $\end{document}$ ensured that each ground truth fascicle had at most one match in the predictions, and vice versa.

At a threshold of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ t = 0.7 $\end{document}$ , we quantified the fraction of fascicles that were over-segmented (i.e., one ground truth fascicle was split into multiple predicted fascicles) or under-segmented (i.e., multiple ground truth fascicles were merged into a single predicted fascicle). We also measured the fraction of missed fascicles (i.e., false negatives) by cross-sectional area (Buyukcelik et al 2023): tiny (< 0.02 mm^2^), small (0.02–0.09 mm^2^), medium (0.09–0.3 mm^2^), and large (> 0.3 mm^2^). The four categories corresponded to effective circular diameters of < 0.16 mm, 0.16–0.34 mm, 0.34–0.62 mm, and > 0.62 mm, respectively. We defined the effective circular diameter as the diameter of a circle with the same area as the original fascicle segmentation.

Over-segmentation, under-segmentation, and missed fascicle rates were calculated as the proportion of affected fascicles relative to the total number of ground truth fascicles. These metrics were computed at 8-slice intervals (∼0.1 mm) and averaged across a test volume.

Anatomical accuracy

2.8.3.

In addition to standard segmentation metrics and object-level precision, we also evaluated the anatomical coherence of the network’s predictions based on the preservation of 3D fascicle connectivity and the rate of topological violations.

Centerline Dice

2.8.3.1.

The centerline Dice score (clDice) (Shit et al 2021) measures the network’s ability to preserve the correct anatomical connectivity of fascicles. To calculate this similarity metric, we computed the topology precision ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {T_{{\mathrm{prec}}}} $\end{document}$ ) and sensitivity ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {T_{{\mathrm{sens}}}} $\end{document}$ ) as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{T_{{\mathrm{prec}}}} = \frac{{\left| {{\mathrm{Sk}}\left( {{F_{\mathrm{P}}}} \right)\mathop \cap \nolimits {F_{\mathrm{G}}}} \right|}}{{\left| {{\mathrm{Sk}}\left( {{F_{\mathrm{P}}}} \right)} \right|}}\end{equation*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{T_{{\mathrm{sens}}}} = \frac{{\left| {{\mathrm{Sk}}\left( {{F_G}} \right) \cap {F_P}} \right|}}{{\left| {{\mathrm{Sk}}\left( {{F_G}} \right)} \right|}}\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\mathrm{Sk}}\left( {{F_{\mathrm{P}}}} \right) $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\mathrm{Sk}}\left( {{F_{\mathrm{G}}}} \right) $\end{document}$ are the skeletonized predicted and ground truth fascicle masks, respectively, generated using the skeletonize function in the Python scikit-image package (Walt et al 2014). The clDice is then calculated as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\mathrm{clDice}} = \frac{{2 \cdot {T_{{\mathrm{prec}}}} \cdot {T_{{\mathrm{sens}}}}}}{{{T_{{\mathrm{prec}}}} + {T_{{\mathrm{sens}}}}}}\end{equation*}\end{document}

with values ranging from 0 (no topology preservation) to 1 (perfect topology preservation).

Anatomical error rate

2.8.3.2.

We quantified the anatomical quality of neural tissue predictions by measuring the frequency of structural abnormalities, as detailed in supplementary note 3. Two types of errors were identified: (1) exposed fascicle voxels that directly contacted background voxels, and (2) abrupt transitions between neural structures along the nerve. The anatomical error rate was calculated as the ratio of anomalous voxels to total foreground voxels (i.e., fascicle and epineurium).

Inter-slice boundary consistency

2.8.3.3.

The inter-slice boundary consistency quantifies the stability of predicted contours along the nerve, i.e., the lack of inter-slice jitter of a given boundary. For each pair of adjacent slices $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ z $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ z + 1 $\end{document}$ , we computed the boundary score (BF score) (Csurka et al 2013) with single-pixel tolerance. Let $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {B_z} $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {B_{z + 1}} $\end{document}$ be the boundary regions (one-pixel-wide contours) of the predicted mask in slices $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ z $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ z + 1 $\end{document}$ , respectively. We calculated the boundary precision ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {P_{z,z + 1}} $\end{document}$ ) and recall ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {R_{z,z + 1}} $\end{document}$ ) as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{P_{z,z + 1}} = \frac{1}{{\left| {{B_z}} \right|}}\mathop \sum \limits_{x \in {B_z}} 1\left[ {d\left( {x,{B_{z + 1}}} \right) &lt; \tau } \right]\end{equation*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{R_{z,z + 1}} = \frac{1}{{\left| {{B_{z + 1}}} \right|}}\mathop \sum \limits_{x \in {B_{z + 1}}} 1\left[ {d\left( {x,{B_z}} \right) &lt; \tau } \right]\end{equation*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ d\left( \cdot \right) $\end{document}$ denotes the Euclidean distance, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 1\left[ \cdot \right] $\end{document}$ is the indicator function (1 if the condition is true, 0 otherwise), and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \tau = 1 $\end{document}$ is the distance tolerance in pixels. The BF score between slices $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ z $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ z + 1 $\end{document}$ is then calculated as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{equation*}{\text{BF scor}}{{\mathrm{e}}_{z,z + 1}} = \frac{{2 \cdot {P_{z,z + 1}} \cdot {R_{z,z + 1}}}}{{{P_{z,z + 1}} + {R_{z,z + 1}}}}\end{equation*}\end{document}

with values ranging from 0 (no boundary consistency) to 1 (maximal boundary consistency).

Fascicle split/merge detection

2.8.3.4.

Fascicles split and merge along the nerve (Upadhye et al 2022). We evaluated the model’s ability to detect these structural changes by analyzing the rates of split/merge events in predictions versus ground truth.

We identified fascicle split/merge locations in ground truth and network prediction by analyzing the division and fusing of fascicle boundaries (Upadhye et al 2022). For each image, we calculated the rate of split or merge events per millimeter as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {R_{{\mathrm{split}}}} = {N_{{\mathrm{split}}}}/L $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {R_{{\mathrm{merge}}}} = {N_{{\mathrm{merge}}}}/L $\end{document}$ , where and N_merge_ are the number of detected split and merge events, respectively, and L is the length of the nerve segment (along the z axis) in millimeters. The deviation from the ground truth event rate was quantified as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \Delta R = {R_{\mathrm{P}}} - {R_{\mathrm{G}}} $\end{document}$ , where R_p_ and R_G_ are the predicted and ground truth event rates, respectively. For both split and merge events, we computed:

$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \Delta {R_{{\mathrm{split}}}} = {R_{{\mathrm{split}},P}} - {R_{{\mathrm{split}},G}} $\end{document}$

$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \Delta {R_{{\mathrm{merge}}}} = {R_{{\mathrm{merge}},P}} - {R_{{\mathrm{merge}},G}} $\end{document}$

The split/merge event rate deviation was quantified as the average absolute difference between predicted and ground truth rates:

$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\text{Split/Merge Rate Deviation}} = \frac{{\left| {\Delta {R_{{\mathrm{split}}}}} \right| + \left| {\Delta {R_{{\mathrm{merge}}}}} \right|}}{2} $\end{document}$ with lower values indicating better split/merge detection accuracy.

Results

We developed an automated, deep learning-based approach to segment volumetric microCT images of cadaveric human vagus nerves by delineating the nerve boundary and 3D fascicular structure. We trained a 3D U-Net and evaluated its performance through segmentation metrics and by assessing its consistency with expected anatomical structures.

Segmentation accuracy

3.1.

We implemented a 3D U-Net to segment vagus nerve fascicles and epineurium in microCT images and evaluated its performance using five-fold cross-validation. The 3D U-Net outperformed the 2D U-Net in both spatial overlap and boundary accuracy metrics (figure 3). Segmentation maps from three representative nerve samples, selected from the test set of their respective folds, showed that the 3D U-Net produced clearer, smoother fascicle boundaries with fewer artifacts compared to the 2D U-Net segmentations (figure 3(a)).

*3D U-Net yields more accurate automated segmentations of vagus nerve fascicles and epineurium from microCT compared to a 2D U-Net. (a) Representative results comparing 3D and 2D U-Nets for three nerve samples ranked by 3D U-Net Dice similarity coefficient (DSC) at (i) high (90th percentile), (ii) medium (50th percentile), and (iii) low (20th percentile) performance. Images show 3D rendered volumes (top row of each subpanel) and corresponding middle cross sections (bottom row of each subpanel) of raw microCT, ground truth (GT), and predictions from 3D and 2D U-Nets. Scale bars are 1 mm. (b)–(d) Segmentation metrics for each class (fascicles or epineurium) across five cross-validation folds (N = 100 images total). Plots show mean values with 95% confidence intervals; individual fold means are shown as gray dots connected by dashed lines. Performance metrics include: (b) DSC, (c) surface DSC (single-pixel tolerance), and (d) average symmetric surface distance (ASSD, in μm). ↑+, higher is better; ↓+, lower is better. ***, p < 0.0001.

Quantitatively, the 3D U-Net demonstrated superior segmentation performance compared to the 2D U-Net across multiple metrics. For volumetric overlap, it achieved higher DSCs for both fascicles (0.928 vs. 0.903, +2.8%, p < 0.0001) and epineurium (0.933 vs. 0.920, +1.4%, p < 0.0001; figure 3(b)). Similarly, the 3D U-Net improved boundary delineation, as shown by higher surface DSC values under single-pixel tolerance for fascicles (0.819 vs. 0.743, +10.2%, p < 0.0001) and epineurium (0.822 vs. 0.770, +6.8%, p < 0.0001; figure 3(c). The 3D U-Net significantly reduced absolute boundary error, as indicated by smaller ASSD values for fascicles (35.9 μm vs. 107 μm, −66.4%, p < 0.0001) and epineurium (39.5 μm vs. 111 μm, −64.4%, p < 0.0001; figure 3(d)). Additional voxel-wise metrics further confirmed the improved performance of 3D U-Net across IoU, sensitivity, and specificity measurements (table S3 in supplementary note 4).

Detection of fascicle instances in cross sections

3.2.

We compared the accuracy of the 3D and 2D U-Nets in identifying individual fascicles in cross sections by matching predicted instances with the ground truth using IoU overlap and by calculating the fascicle F1 score at different levels of stringency to identify a ‘match’.

A qualitative comparison of two example nerve samples revealed clear differences, where the 3D U-Net identified more fascicle instances and produced fewer segmentation errors than the 2D U-Net (figure 4(a)). The quantitative performance was also better for the 3D U-Net, which had higher fascicle F_1_ scores across IoU thresholds from 0.5 to 0.9 (figure 4(b)). At the IoU threshold = 0.7, the 3D U-Net produced fewer under-segmentation errors (i.e., merging adjacent fascicles into one) than the 2D U-Net (3.94% vs. 4.62%, p < 0.01), while both showed similar over-segmentation rates (p = 0.66) (figure 4(c)).

*3D U-Net improves detection of individual fascicles in nerve cross sections. (a) MicroCT cross sections and fascicle segmentations by 3D and 2D U-Nets for two nerve samples ranked by 3D U-Net’s fascicle F1 score at (i) high (90th percentile) and (ii) low (20th percentile) performance. Fascicle F1 score (at intersection-over-union [IoU] threshold = 0.7) and Dice similarity coefficient (DSC) are labeled. Scale bars are 1 mm. Segmented fascicles (yellow) are overlaid with ground truth fascicles (GT, white dashed lines) with per-fascicle IoU values (omitted for false positives/negatives). (b) Mean fascicle F1 score with 95% confidence interval (CI) at various IoU thresholds for 3D and 2D U-Nets. Higher IoU means stricter overlap criteria for matching. (c) Comparison of per-fascicle over-segmentation (left) and under-segmentation (right) rates. (d) Fascicle size classification into four categories (Buyukcelik et al 2023) (see distribution in figure S4 in supplementary note 5). deff, effective circular diameter. Scale bar is 1 mm. (e) Percentage of missed fascicles. In (c), (e), plots show mean values with 95% CIs; individual fold means are shown as gray dots connected by dashed lines. ↑+, higher is better; ↓+, lower is better. ****, p < 0.0001; ***, p < 0.001; **, p < 0.01; , p < 0.05; ns, not significant.

A size-based fascicle detection analysis indicated that the 3D U-Net more effectively captured small (miss rate: 3.39% vs. 4.39%, p < 0.001) and medium (miss rate: 2.16% vs. 2.42%, p < 0.01) fascicles compared to the 2D U-Net (figures 4(d) and (e)). These two size categories constituted 66% of all fascicles in our dataset (figure S4 in supplementary note 5). Both approaches performed similarly for tiny (p = 0.059) and large (p = 0.047) fascicles. For successfully matched fascicles, the 3D U-Net achieved better instance-level segmentation accuracy, with higher IoU and lower Hausdorff distance values for small and medium fascicles (table S4 in supplementary note 5).

Anatomical Accuracy

3.3.

Visual comparisons showed that the 3D U-Net maintained continuous fascicular paths, while the 2D U-Net produced both false connections and disconnected paths (figure 5(a)). Quantitatively, the 3D U-Net yielded higher clDice scores than the 2D U-Net (0.90 vs. 0.84, +6.7%, p < 0.0001; figure 5(b)), confirming greater accuracy in the predicted fascicle connectivity. Further, the 3D approach lowered anatomical error rates by ∼2.5-fold compared to the 2D U-Net (0.65% vs. 1.60%, p < 0.0001; figures 5(c) and (d), representing on average ∼9000 fewer erroneous voxels per test image. Ablation experiments—wherein we trained the 3D U-Net without the anatomy-aware loss—confirmed that a priori anatomical knowledge enhanced our 3D U-Net’s ability to preserve nerve structure compared to a baseline 3D architecture (table S5 in supplementary note 6).

*3D U-Net preserves fascicle connectivity and reduces anatomical violations. (a) Predicted fascicle skeletons (red) vs. ground truth (GT, gray) for two examples ranked by 3D U-Net centerline Dice (clDice) score at (i) high (90th percentile) and (ii) medium (50th percentile) performance. clDice and average Dice similarity coefficient (DSC) values are labeled. Quantitative comparisons of (b) fascicle connectivity measured by clDice scores and (c) anatomical error rates (%) between 3D and 2D U-Nets. Plots show mean values with 95% confidence intervals; individual fold means are shown as gray dots connected by dashed lines. (d) Visualization of errors violating expected nerve anatomy (red voxels) for two samples ranked by 3D U-Net error rate at (i) low (20th percentile) and (ii) high (90th percentile) violation levels. Images show cross-sectional and 3D views, illustrating typical errors like broken and discontinuous fascicles. Cross section locations are marked by yellow dashed boxes on the 3D volumes. Voxel error rates and average DSC of example images are labeled. Scale bars are 1 mm. ↑+, higher is better; ↓+, lower is better. ***, p < 0.0001.

The 3D U-Net showed greater inter-slice consistency than the 2D U-Net and the GT annotation (figures 6(a), (c) and (d)). The 3D U-Net generated fascicle segmentations with smooth boundaries along the nerve’s length, whereas the 2D U-Net produced jagged boundaries with evident artifacts (figure 6(c)). We quantified this observation: along the longitudinal axis of the nerve, the 3D U-Net consistently demonstrated higher inter-slice consistency, maintaining BF score values above 0.8, which was a two-fold increase compared to the 2D U-Net (figure 6(d)). On average, the 3D U-Net yielded significantly more consistent boundaries for both epineurium (0.77 vs. 0.60, +28.3%, p < 0.0001) and fascicles (0.81 vs. 0.77, +5.2%, p < 0.0001; figure 6(a)). This boundary consistency is critical for tracking branches and fascicles along the nerve.

*3D U-Net improves accuracy of longitudinal fascicular structure, including inter-slice jitter and split/merge events. (a) Boundary F1 (BF score) comparison for fascicles (left) and epineurium (right) across 3D/2D predictions and ground truth (GT). Higher scores indicate better boundary consistency between consecutive cross sections. (b) Fascicle split/merge event deviation from GT (%), measured as percentage difference between predicted and GT event frequency (number of splits and merges per millimeter). In (a), (b), plots show mean values with 95% confidence intervals; individual fold means are shown as gray dots connected by dashed lines. ↑+, higher is better; ↓+, lower is better. ***, p < 0.0001. (c)–(e) Example nerve sample showing: (c) 3D renderings of fascicle segmentations from 3D U-Net (blue), 2D U-Net (orange), and GT (gray); dashed arrow marks location of origins in (d), (e). Scale bars are 1 mm. (d) BF score profiles along the example nerve sample in panel (c). (e) Distribution of fascicle merge (red) and split (blue) events for the example nerve sample in panel (c).

In the analysis of fascicle split/merge events, the 3D U-Net provided substantially better agreement with ground truth than the 2D U-Net (percentage deviation: 7.4% vs. 43.1%, p < 0.0001; figure 6(b)). In contrast, the 2D U-Net generated a high number of false split/merge events, whereas the 3D U-Net correctly detected most true events (figure 6(e)). When errors occurred, both networks tended to over-predict both split and merge events, resulting in a strong correlation between split and merge rate deviations (figure S5 in supplementary note 7).

Discussion

We developed an enhanced deep learning approach for accurate, automated 3D segmentation of vagus nerve fascicular morphology from microCT images, designed to overcome limitations of existing techniques. Specifically, we used a 3D U-Net CNN to leverage volumetric context (figure 1) and integrated a novel anatomy-aware loss function during training to enforce anatomical coherence (figure 2). This methodology yielded substantial improvementscompared to a 2D U-Net, as quantified by multiple evaluation metrics. Overall segmentation accuracy was significantly increased (figure 3), and task-specific evaluations confirmed improved fascicle detection (figure 4) and fewer anatomical errors (figure 5). The proposed 3D U-Net also improved boundary continuity and more accurately identified fascicle splits/merges along the nerve (figure 6). Collectively, these performance gains establish our method as a valuable tool for generating accurate, quantitative 3D anatomical maps of the vagal pathway from high-resolution imaging data. By addressing bottlenecks in segmentation quality and throughput, our approach provides data that are fundamental for understanding the nerve’s functional organization and for developing realistic computational models to guide VNS therapies.

3D U-net better resolves vagal morphology than 2D U-Net

4.1.

Segmenting the vagus nerve is challenging due to its complex morphology, with highly variable fascicle sizes and counts across individuals (Pelot et al 2020) and along the nerve of a given individual (Upadhye et al 2022). The longitudinal changes in vagal morphology expose a significant limitation of 2D segmentation methods (Buyukcelik et al 2023, Jayaprakash et al 2023), which process each slice independently and ignore inter-slice context. Here, we demonstrated the benefits of a 3D U-Net CNN for analyzing 3D morphology. This represents the first such application for peripheral nerve segmentation and extends the established success of volumetric deep learning for other intricate structures, such as vasculature (Lapierre-Landry et al 2023) and airways (Garcia-Uceda et al 2021).

The 3D CNN’s ability to incorporate volumetric context resulted in superior segmentation performance. By integrating voxel information along the nerve, our 3D U-Net more accurately resolved the boundaries of fascicles and epineurium (figure 3 (c) and (d). The 3D U-Net reduced the boundary artifacts (e.g., broken, spurious, or jagged edges) that are observed in the outputs of the 2D U-Net due to its slice-by-slice processing. Practically, the higher surface DSC achieved by the 3D U-Net will require less manual correction time following the automated segmentation (figure 3(c)) (Vaassen et al 2020).

Fascicle size and location are critical for computational modeling of PNS (Grinberg et al 2008, Davis et al 2023, Musselman et al 2023). The 3D U-Net achieved higher fascicle-level accuracy within cross sections than the 2D U-Net (figure 4(b)). With volumetric processing, the 3D segmentation mitigated incorrect merging of adjacent fascicles (i.e., under-segmentation), a critical and frequent failure mode in 2D methods that arises from imperfect contrast and small spacing between fascicles (figure 4(c)). This accuracy gain in fascicle detection was size-dependent, with the most pronounced improvements observed for small- and medium-sized fascicles (0.16–0.62 mm in diameter) (figure 4(e)), i.e. two-thirds of fascicles (figure S4 in supplementary note 5). Notably, both networks showed reduced performance in segmenting tiny fascicles (figure 4(e) and table S4 in supplementary note 5), likely due to the resolution limit of our microCT data; the insufficient definition of these smallest fascicles appeared to confound both manual annotation and network prediction, thus lowering the match rate.

Incorporating anatomical constraints improves segmentation accuracy

4.2.

Accurate segmentation of vagal morphology requires more than strong quantitative performance according to established generalized segmentation metrics: for meaningful downstream modeling and analysis, outputs must faithfully reproduce the anatomy of peripheral nerves. While U-Net architectures (Ronneberger et al 2015, Çiçek et al 2016, Isensee et al 2021) provided a foundation for high-quality segmentation, we enhanced network training by incorporating structural priors, i.e., pre-existing knowledge about constraints on neuroanatomical structures. Specifically, we implemented an anatomy-aware loss function that supervised the network to learn the fundamental relationship wherein fascicles are enclosed by epineurium (Stewart 2003, Pelot et al 2020, Gupta et al 2022). Our anatomy-aware loss function resulted in segmentations with improved anatomical integrity compared to baseline methods, preserving fascicle connectivity (figure 5(b)) and reducing topological errors such as broken fascicles (figure 5 (c). Our ablation study differentiated these effects, confirming that the anatomy-aware loss significantly minimized voxels violating structural rules, while the major gains in fascicle connectivity resulted from adopting the 3D architecture (table S5 in supplementary note 6).

Prior studies also integrated anatomical knowledge in deep learning networks to improve segmentation beyond pixel classification (Hu et al 2019, Gupta et al 2022). This principle—implemented either through loss functions, network architectures, or postprocessing—has proven effective for structures with relatively predictable global topologies, such as tubular blood vessels (Shit et al 2021), cardiac chambers (Oktay et al 2018), and brain subcortical areas (Lorio et al 2016). Our work extends this paradigm to peripheral nerve morphometry, suggesting that enforcing basic local spatial relationships of neural tissues can effectively regularize segmentation and improve anatomical fidelity for complex vagal structures.

Task-specific metrics guide meaningful performance assessment

4.3.

In addition to our anatomy-aware loss function used for network training, we also implemented anatomy-aware metrics for post hoc evaluation of segmentation accuracy. Standard pixel-based segmentation metrics, such as DSC, often fail to reflect the practical utility of segmentation outputs (Müller et al 2022). For instance, evaluating performance at the object-level instead of pixel-level is a common approach for cell segmentation (Caicedo et al 2019) and general computer vision (Kirillov et al 2019), but its application specifically to peripheral nerve morphology has been limited. Among studies using 2D CNNs, Buyukcelik et al (2023) and Verardo et al (2025) reported fascicle-level accuracy in cross sections. Our evaluation extended beyond geometric measures to include 2D and 3D metrics essential for PNS modeling, neuroanatomical pathway mapping, and morphological measurements. Specifically, compared to a 2D U-Net, we demonstrated our 3D U-Net’s improved detection of fascicles of different sizes (Figure 4(b) and (e), preserved fascicle connectivity (figure 5(b)), reduced structural violations (figure 5(c)), reduced discontinuities between transverse cross sections (figure 6(a)), and a 6-fold decrease in errors detecting fascicle split/merge events (figure 6(b)). These corrections were anatomically important (e.g. separating erroneously merged fascicles) but resulted in only marginal increases in the DSC (figure 3(b)). Global metrics such as DSC are dominated by bulk volume and are relatively insensitive to the small fraction of boundary pixels that determine fascicular structures. Therefore, focusing on task-specific performance is crucial as segmentation with a high DSC may still contain disconnected fascicles or abrupt cross-sectional changes, which hinder both longitudinal fascicle tracking and computational modeling.

Inter-slice jitter poses an important challenge to computational modeling of PNS, particularly for meshing the nerve geometry in a finite element model. We used the BF score to quantify inter-slice jitter, which is a metric that is sensitive to changes in contour connectivity (Csurka et al 2013). However, the BF score decreases both when the network introduces discontinuities across slices and during true fascicle splitting and merging events. This ambiguity may complicate the specific assessment of network-induced errors (jitter) versus actual anatomical changes (splits/merges). Although we separately analyzed accuracy of splits and merges (figures 6(b) and (e)), the BF score or an alternative metric could more definitively assess inter-slice jitter by applying it to only portions of fascicles without splits or merges.

Segmentation accuracy is critical to neuroanatomical analyses and computational modeling

4.4.

The accuracy of nerve morphology segmentation has important implications for anatomical analyses. Our automated 3D segmentation approach reduces boundary errors and enhances fascicle detection, thus providing a significantly more reliable source of morphometrics (e.g. fascicle count, size, spatial organization). This gain in accuracy is critical, as variations in morphology have important effects on neural responses to bioelectronic therapies (Grinberg et al 2008, Davis et al 2023). The ability to automatically extract precise morphology from high-resolution images enables analyses of anatomical variations within the vagus nerve across different locations, individuals, and species (Pelot et al 2020).

The 3D U-Net also stabilizes inter-slice boundaries, preserves fascicle connectivity, and accurately detects splitting and merging events, all of which are important for mapping the functional organization of peripheral nerves by proximally tracking fascicles that innervate specific organs and tissues. For example, histology and/or microCT have been used to map the functional organization of the pig vagus nerve (Settell et al 2020, Jayaprakash et al 2023, Thompson et al 2023, 2025) and nerves of the upper limb in humans (Sunderland 1945). However, fascicles in those nerves split and merge far less frequently than in the human vagus nerve.

The anatomical accuracy of nerve segmentations is crucial for computational modeling of PNS. Computational models enable the examination of mechanisms of action of neural stimulation therapies, as well as the design of electrode geometries, placement, and stimulation parameters (Wongsarnpigoon and Grill 2010, Ackermann et al. 2011, Schiefer et al 2012, Romeni et al 2020, Aristovich et al 2021). Segmentation artifacts (such as incorrect fascicle merging) can propagate errors into predicted fiber recruitment by finite element and fiber models (Verardo et al 2025). Therefore, the accurate extraction of nerve morphology is a prerequisite for reliable model-based design.

Further, vagal mapping derived from accurate 3D segmentation is foundational for improving clinical therapies via computational modeling. This mapping characterizes the functional roles of fascicles by tracing their connectivity through the 3D morphological structure to nerve branches targeting specific organs (e.g., heart, larynx, stomach) and thus identifying on- and off-target pathways. The resulting functional neural map can guide computational models to optimize spatial selectivity for clinical relevance. For example, the efficacy of VNS is limited by off-target activation of fibers that course in the recurrent laryngeal nerve, resulting in coughing, hoarseness, and voice alterations (Ben-Menachem 2001, Nicolai et al 2020, Settell et al 2020). By integrating functional mapping into biophysical models of VNS, researchers can design novel electrode geometries and stimulation parameters that maximize selectivity and minimize off-target effects to improve therapeutic efficacy (Aristovich et al 2021, Hussain et al 2024).

Computational models of PNS typically define the nerve morphology by extruding a single cross section, thus assuming constant cross-sectional morphology (Helmers et al 2012, Bucksot et al 2021, Musselman et al 2023, Tebcherani et al 2024). Conversely, a recent study developed a pipeline to model true 3D nerve morphology based on segmented microCT data (Marshall et al 2025, in review), motivated by the observation that the morphology of human vagus nerves changes every ∼0.56 mm (Upadhye et al 2022). Indeed, the use of microCT is becoming commonplace as a complement to histology to investigate peripheral neuroanatomy (Thompson et al 2020, Upadhye et al 2022, Jayaprakash et al 2023). This convergence of advancements in computational modeling and imaging will benefit from segmentation algorithms that can robustly, accurately, and efficiently process imaging data into simulation inputs.

Application to segmentation of other imaging data

4.5.

We expect our 3D U-Net segmentation approach to translate well to datasets with varied staining methods, nerve types, and species. For example, nerves are often stained with agents other than PTA for microCT imaging, such as osmium tetroxide (Upadhye et al 2022) or Lugol’s iodine (Thompson et al 2020, Jayaprakash et al 2023); nerves other than vagus are also established targets for implanted neuromodulation devices, including the tibial nerve (Delianides et al 2020, Rogers et al 2021), sacral nerve (Li et al 2016), hypoglossal nerve (Strollo et al 2014), and median nerve (Tan et al 2015); and animal models are often used for therapy development despite their different morphologies from humans (Pelot et al 2020, Settell et al 2020). In all cases, our anatomy-aware loss function and evaluation metrics can be applied directly, and our trained 3D U-Net may provide satisfactory results. However, in some cases—e.g., microCT images acquired with different stains or nerves with morphologies that differ substantially from human vagus nerves—the 3D U-Net may require either de novo training or transfer learning with sparser, dataset-specific ground truth data (Verardo et al 2025). Quantitative analyses of peripheral neuromodulation targets will benefit from our improved segmentation models and metrics.

Our 3D U-Net approach is more demanding than 2D U-Nets, requiring more time for generating ground truth annotations and computational resources for training and inference. Generating 3D ground truth annotations for training is more labor-intensive than 2D, requiring 30 min to 1.5 h per 64-slice (0.73 mm-long) volume, while 2D annotation typically takes less than 5 min per cross section. In the present work, we used a semi-automated workflow with a foundational vision model (µSAM) to segment the high-contrast epineurium, which significantly accelerated the annotation process. Emerging foundational models—particularly those extended to 3D or fine-tuned on microscopic images (Archit et al 2025)—may enable efficient, human-in-the-loop annotation of more complex features like fascicles, rather than requiring full manual segmentation. Further, due to increased network parameters and image sizes, training and inference in 3D are generally slower than 2D on an equivalent device. For training and inference in 3D, memory constraints may require small batch sizes and patch-wise prediction, which can introduce the computational challenge of reconstructing the whole volume while ensuring the smooth blending of predicted patches. In the case of the human vagus nerve, the demands of the 3D U-Net are warranted given the frequent fascicle splits and merges (Upadhye et al 2022), which are not adequately captured by a single or a stack of 2D images. The value of the 3D U-Net is demonstrated by its lower inter-slice jitter compared to both the 2D U-Net and ground truth (figure 6(a)), in addition to overall higher segmentation quality and anatomical accuracy. For other nerves and objectives, it should be considered whether a 2D U-Net suffices.

Conclusion

We developed an enhanced deep learning approach, featuring a 3D U-Net CNN trained with an anatomy-aware loss function, for segmenting human vagus nerve fascicles and epineurium from microCT images. Compared to a 2D U-Net, our 3D U-Net provided a more reliable structural representation of the vagus nerve with improved segmentation accuracy, fascicle detection, and anatomical coherence. It also demonstrated enhanced performance in task-specific metrics critical to neuroanatomical analysis, including inter-slice consistency and accurate detection of fascicle branching events. These improvements enable more robust quantification of complex vagal morphology and provide high-fidelity 3D anatomical inputs essential for developing clinically relevant computational models of VNS. By automatically and accurately capturing the 3D vagus nerve morphology, this technique will provide the throughput required for large-scale characterization of intra- and inter-individual anatomical variability that influences VNS outcomes. Adaptable to other datasets, nerve targets, and species, this segmentation tool represents a critical advance within the broader strategy of leveraging high-resolution imaging and in silico modeling pipelines to support the rational design of precise, personalized neuromodulation therapies.

Bibliography82

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ackermann D M Bhadra N Foldes E L Kilgore K L 2011 Conduction block of whole nerve without onset firing using combined high frequency and direct current Med. Biol. Eng. Comput.4924151241–5110.1007/s 11517-010-0679-x 20890673 PMC 3438896 · doi ↗ · pubmed ↗
2Archit A et al 2025 Segment anything for microscopy Nat. Methods 2257991579–9110.1038/s 41592-024-02580-439939717 PMC 11903314 · doi ↗ · pubmed ↗
3Aristovich K et al 2021 Model-based geometrical optimisation and in vivo validation of a spatially selective multielectrode cuff array for vagus nerve neuromodulation J. Neurosci. Methods 35210907910.1016/j.jneumeth.2021.10907933516735 · doi ↗ · pubmed ↗
4Assmann R Douven P Kleijnen J van Koeveringe G A Joosten E A Melenhorst J Breukink S O 2020 Stimulation parameters for sacral neuromodulation on lower urinary tract and bowel dysfunction–related clinical outcome: a systematic review Neuromodulation 231082931082–9310.1111/ner.1325532830414 PMC 7818464 · doi ↗ · pubmed ↗
5Ben-Menachem E 2001 Vagus nerve stimulation, side effects, and long-term safety J. Clin. Neurophysiol.1841510.1097/00004691-200109000-0000511709646 · doi ↗ · pubmed ↗
6Ben-Menachem E Mañon-Espaillat R Ristanovic R Wilder B J Stefan H Mirza W Tarver W B Wernicke J F Group F I V N S S 1994 Vagus nerve stimulation for treatment of partial seizures: 1. A controlled study of effect on seizures Epilepsia 3561626616–2610.1111/j.1528-1157.1994.tb 02482.x 8026408 · doi ↗ · pubmed ↗
7Bonaz B Sinniger V Pellissier S 2021 Therapeutic potential of vagus nerve stimulation for inflammatory bowel diseases Front. Neurosci.1565097110.3389/fnins.2021.65097133828455 PMC 8019822 · doi ↗ · pubmed ↗
8Brill N A Tyler D J 2017 Quantification of human upper extremity nerves and fascicular anatomy Muscle Nerve 5646371463–7110.1002/mus.2553428006854 PMC 5712902 · doi ↗ · pubmed ↗