# Integrating Dynamic Representation and Multi-Priors for Transnasal Intubation via Visual Foundation Model

**Authors:** Jinyu Liu, Yang Zhou, Ruoyi Hao, Mingying Li, Yang Zhang, Hongliang Ren

PMC · DOI: 10.3390/bioengineering13020217 · Bioengineering · 2026-02-13

## TL;DR

This paper introduces Glottis-SAM, a new framework for accurate and efficient glottis localization during transnasal intubation using a visual foundation model.

## Contribution

The novel contribution is the integration of dynamic representation learning and multi-prior modeling in a lightweight framework for medical image segmentation.

## Key findings

- Glottis-SAM achieves 72.6% mDice segmentation accuracy on clinical data.
- The model has a compact size of 55.2 MB and an inference speed of 44.3 FPS.
- It outperforms existing methods in robustness and generalization across diverse anatomical conditions.

## Abstract

Accurate and real-time glottis localization is critical for ensuring intraoperative oxygenation and patient safety during nasotracheal intubation. However, representative foundation models exemplified by the Segment Anything Model exhibit notable limitations in medical applications, stemming from their rigid attention mechanisms, feature space misalignment, and insufficient generalization to complex glottal anatomies. To address these challenges, we propose Glottis-SAM, a lightweight and task-adaptive segmentation framework that integrates dynamic representation learning with multi-prior contextual modeling. Specifically, we introduce a hierarchical low-rank adaptation strategy that enables efficient fine-tuning of visual foundation models by preserving geometric priors while significantly reducing computational overhead. To further enhance semantic fusion and generalization, we design a feature aggregation module with dual-path dynamic feature pyramids, which enables complementary optimization from local textures to global semantic structures under varying anatomical conditions. Extensive experiments on three diverse datasets demonstrate that Glottis-SAM achieves state-of-the-art segmentation accuracy with 72.6% mDice, a compact 55.2 MB model size, and 44.3 FPS inference speed on clinical data. These results highlight the model’s robustness, efficiency, and potential for deployment in visual guidance systems for nasotracheal intubation.

## Full-text entities

- **Genes:** VIT (vitrin) [NCBI Gene 5212] {aka VIT1}
- **Diseases:** tracheal injury (MESH:D008476), injury to (MESH:D014947), visual disturbances (MESH:D014786), laryngeal disorders (MESH:D007827), lesion (MESH:D009059), hypoxemia (MESH:D000860)
- **Chemicals:** FOB (-), oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12938067/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12938067/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12938067/full.md

---
Source: https://tomesphere.com/paper/PMC12938067