BREATH-VL: Vision-Language-Guided 6-DoF Bronchoscopy Localization via Semantic-Geometric Fusion

Qingyao Tian; Bingyu Yang; Huai Liao; Xinyan Huang; Junyong Li; Dong Yi; Hongbin Liu

arXiv:2601.03713·cs.CV·January 8, 2026

BREATH-VL: Vision-Language-Guided 6-DoF Bronchoscopy Localization via Semantic-Geometric Fusion

Qingyao Tian, Bingyu Yang, Huai Liao, Xinyan Huang, Junyong Li, Dong Yi, Hongbin Liu

PDF

Open Access

TL;DR

This paper introduces BREATH-VL, a hybrid vision-language framework for accurate 6-DoF bronchoscopy localization, leveraging a new in-vivo dataset and semantic-geometric fusion to improve accuracy and robustness in complex airway navigation.

Contribution

The paper presents BREATH-VL, the first in-vivo endoscopic localization dataset and a hybrid framework combining vision-language cues with geometric registration for enhanced 6-DoF pose estimation.

Findings

01

Reduces translational error by 25.5% compared to state-of-the-art methods.

02

Demonstrates robust semantic localization in challenging surgical scenes.

03

Achieves competitive computational latency with improved accuracy.

Abstract

Vision-language models (VLMs) have recently shown remarkable performance in navigation and localization tasks by leveraging large-scale pretraining for semantic understanding. However, applying VLMs to 6-DoF endoscopic camera localization presents several challenges: 1) the lack of large-scale, high-quality, densely annotated, and localization-oriented vision-language datasets in real-world medical settings; 2) limited capability for fine-grained pose regression; and 3) high computational latency when extracting temporal features from past frames. To address these issues, we first construct BREATH dataset, the largest in-vivo endoscopic localization dataset to date, collected in the complex human airway. Building on this dataset, we propose BREATH-VL, a hybrid framework that integrates semantic cues from VLMs with geometric information from vision-based registration methods for accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Surgical Simulation and Training · Advanced Neural Network Applications