EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams
Hao Li, Daiwei Lu, Jiacheng Wang, Robert J. Webster III, Ipek Oguz

TL;DR
EndoStreamDepth is a real-time, temporally consistent monocular depth estimation framework for endoscopic videos, producing accurate, sharp, and anatomically aligned depth maps to support robotic surgery tasks.
Contribution
It introduces a novel framework combining a single-frame depth network with multi-level temporal modules for improved accuracy and consistency in endoscopic video depth estimation.
Findings
Significantly outperforms existing monocular depth estimation methods.
Produces depth maps with sharp, anatomically aligned boundaries.
Achieves real-time processing suitable for surgical applications.
Abstract
This work presents EndoStreamDepth, a monocular depth estimation framework for endoscopic video streams. It provides accurate depth maps with sharp anatomical boundaries for each frame, temporally consistent predictions across frames, and real-time throughput. Unlike prior work that uses batched inputs, EndoStreamDepth processes individual frames with a temporal module to propagate inter-frame information. The framework contains three main components: (1) a single-frame depth network with endoscopy-specific transformation to produce accurate depth maps, (2) multi-level Mamba temporal modules that leverage inter-frame information to improve accuracy and stabilize predictions, and (3) a hierarchical design with comprehensive multi-scale supervision, where complementary loss terms jointly improve local boundary sharpness and global geometric consistency. We conduct comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Image Processing Techniques and Applications
