# ConsisTNet: a spatio-temporal approach for consistent anatomical localization in endoscopic pituitary surgery

**Authors:** Zhehua Mao, Adrito Das, Danyal Z. Khan, Simon C. Williams, John G. Hanrahan, Danail Stoyanov, Hani J. Marcus, Sophia Bano

PMC · DOI: 10.1007/s11548-025-03369-2 · International Journal of Computer Assisted Radiology and Surgery · 2025-04-29

## TL;DR

ConsisTNet improves consistent anatomical localization in pituitary surgery by using spatio-temporal features for more stable and accurate real-time guidance.

## Contribution

ConsisTNet introduces a novel spatio-temporal model with semi-supervised pseudo-labeling to enhance prediction consistency in endoscopic surgery.

## Key findings

- ConsisTNet improves segmentation consistency by 4.56 and 9.45% in IoU for two regions.
- Landmark detection consistency is enhanced with a 43.86% reduction in mean distance error.
- The model achieves 202 FPS with FP16 precision, enabling real-time intraoperative use.

## Abstract

Automated localization of critical anatomical structures in endoscopic pituitary surgery is crucial for enhancing patient safety and surgical outcomes. While deep learning models have shown promise in this task, their predictions often suffer from frame-to-frame inconsistency. This study addresses this issue by proposing ConsisTNet, a novel spatio-temporal model designed to improve prediction stability.

ConsisTNet leverages spatio-temporal features extracted from consecutive frames to provide both temporally and spatially consistent predictions, addressing the limitations of single-frame approaches. We employ a semi-supervised strategy, utilizing ground-truth label tracking for pseudo-label generation through label propagation. Consistency is assessed by comparing predictions across consecutive frames using predicted label tracking. The model is optimized and accelerated using TensorRT for real-time intraoperative guidance.

Compared to previous state-of-the-art models, ConsisTNet significantly improves prediction consistency across video frames while maintaining high accuracy in segmentation and landmark detection. Specifically, segmentation consistency is improved by 4.56 and 9.45% in IoU for the two segmentation regions, and landmark detection consistency is enhanced with a 43.86% reduction in mean distance error. The accelerated model achieves an inference speed of 202 frames per second (FPS) with 16-bit floating point (FP16) precision, enabling real-time intraoperative guidance.

ConsisTNet demonstrates significant improvements in spatio-temporal consistency of anatomical localization during endoscopic pituitary surgery, providing more stable and reliable real-time surgical assistance.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12167350/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12167350/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12167350/full.md

---
Source: https://tomesphere.com/paper/PMC12167350