# X2P-Net: Context-Aware 2D/3D Vertebra Localization

**Authors:** Rong Tao, Kangqing Ye, Weijun Zhang, Wenyuan Sun, Derong Yu, Donghua Hang, Guoyan Zheng

PMC · DOI: 10.3390/bioengineering13020178 · Bioengineering · 2026-02-03

## TL;DR

X2P-Net is a new method for accurately locating vertebrae in 3D from 2D X-rays during spine surgery, improving alignment and accuracy.

## Contribution

X2P-Net introduces a novel Transformer architecture, BrickFormer, for efficient and accurate 2D/3D vertebra localization.

## Key findings

- X2P-Net achieves 96.9% and 98.8% accuracy at 10 mm and 20 mm thresholds on the BiSpineX dataset.
- The method shows a mean position error of 2.99 mm and an AUC of 0.9923 on BiSpineX.
- On SheepSpineX, it achieves 98.4% and 100.0% accuracy with a mean position error of 1.08 mm and an AUC of 0.9972.

## Abstract

In the context of minimally invasive spine surgery, accurately estimating the 3D coordinates of the vertebrae from intraoperative 2D X-ray images is crucial for aligning preoperative data with the patient’s real-time posture. However, existing methods are hindered by the ill-posed nature of 2D-to-3D localization and the distinctive anatomical features of the spinal column, leading to ambiguities and reduced accuracy. In this paper, we introduce X2P-net, a novel prompt-guided and semantic context-enhanced 2D/3D vertebra detection framework. To achieve this, we design a novel Transformer architecture, referred to as BrickFormer, which can automatically extract the refined vertebral foreground context at low computational cost using a dual-attention mechanism. Comprehensive experiments were conducted to validate the proposed approach on two datasets: a large-scale synthetic dataset (BiSpineX) and a sheep spine dataset (SheepSpineX). Results obtained from these experiments demonstrate superior landmark localization performance of the proposed method compared to other state-of-the-art methods. Specifically, on the BiSpineX dataset, X2P-Net achieves percentages of 96.9% and 98.8% at 10 mm and 20 mm thresholds, respectively, a mean position error of 2.99 mm, and an AUC of 0.9923. Similar superior performance was also observed when the proposed method was applied to the SheepSpineX dataset, with percentages of 98.4% and 100.0% at 10 mm and 20 mm thresholds, respectively, a mean position error of 1.08 mm, and an AUC of 0.9972.

## Full-text entities

- **Diseases:** FE (MESH:C564835), fractures (MESH:D050723), PCL (MESH:D000080041), injury to (MESH:D014947), fractured vertebra (MESH:C562952), spinal deformities (MESH:D013122), scoliosis (MESH:D012600), SCE (MESH:D057180)
- **Chemicals:** BiSpineX (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Ovis aries (domestic sheep, species) [taxon 9940], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12938846/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12938846/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12938846/full.md

---
Source: https://tomesphere.com/paper/PMC12938846