Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data
Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li

TL;DR
This paper introduces Rt-OmniMVS, a real-time omnidirectional depth estimation method using teacher-student learning and a lightweight network, achieving high accuracy and efficiency for real-world 3D perception on edge devices.
Contribution
The paper presents a novel real-time omnidirectional depth estimation approach with a combined spherical sweeping method and a teacher-student training strategy leveraging unlabeled data.
Findings
Achieves 15 fps inference on edge platforms.
Comparable accuracy to state-of-the-art methods.
Effective in diverse indoor and outdoor scenarios.
Abstract
Omnidirectional depth estimation enables efficient 3D perception over a full 360-degree range. However, in real-world applications such as autonomous driving and robotics, achieving real-time performance and robust cross-scene generalization remains a significant challenge for existing algorithms. In this paper, we propose a real-time omnidirectional depth estimation method for edge computing platforms named Rt-OmniMVS, which introduces the Combined Spherical Sweeping method and implements the lightweight network structure to achieve real-time performance on edge computing platforms. To achieve high accuracy, robustness, and generalization in real-world environments, we introduce a teacher-student learning strategy. We leverage the high-precision stereo matching method as the teacher model to predict pseudo labels for unlabeled real-world data, and utilize data and model augmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Industrial Vision Systems and Defect Detection · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
