TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning
Rui Wang, Ying Zhou, Hao Wang, Wenwei Zhang, Qiang Li, Zhiwei Wang

TL;DR
This paper introduces TiS-TSL, a novel framework for stereo matching in surgical videos that leverages minimal supervision and temporal modeling to produce more stable and accurate disparity predictions across frames.
Contribution
The paper proposes a unified, time-switchable teacher-student learning framework that incorporates temporal consistency for surgical video stereo matching under sparse supervision.
Findings
Outperforms state-of-the-art image-based methods by at least 2.11% in TEPE.
Achieves at least 4.54% improvement in EPE.
Effectively enforces spatio-temporal coherence in disparity predictions.
Abstract
Stereo matching in minimally invasive surgery (MIS) is essential for next-generation navigation and augmented reality. Yet, dense disparity supervision is nearly impossible due to anatomical constraints, typically limiting annotations to only a few image-level labels acquired before the endoscope enters deep body cavities. Teacher-Student Learning (TSL) offers a promising solution by leveraging a teacher trained on sparse labels to generate pseudo labels and associated confidence maps from abundant unlabeled surgical videos. However, existing TSL methods are confined to image-level supervision, providing only spatial confidence and lacking temporal consistency estimation. This absence of spatio-temporal reliability results in unstable disparity predictions and severe flickering artifacts across video frames. To overcome these challenges, we propose TiS-TSL, a novel time-switchable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
