Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data

Ming Li; Xiong Yang; Chaofan Wu; Jiaheng Li; Pinzhi Wang; Xuejiao Hu; Sidan Du; Yang Li

arXiv:2409.07843·cs.CV·November 11, 2025

Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data

Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li

PDF

Open Access

TL;DR

This paper introduces Rt-OmniMVS, a real-time omnidirectional depth estimation method using teacher-student learning and a lightweight network, achieving high accuracy and efficiency for real-world 3D perception on edge devices.

Contribution

The paper presents a novel real-time omnidirectional depth estimation approach with a combined spherical sweeping method and a teacher-student training strategy leveraging unlabeled data.

Findings

01

Achieves 15 fps inference on edge platforms.

02

Comparable accuracy to state-of-the-art methods.

03

Effective in diverse indoor and outdoor scenarios.

Abstract

Omnidirectional depth estimation enables efficient 3D perception over a full 360-degree range. However, in real-world applications such as autonomous driving and robotics, achieving real-time performance and robust cross-scene generalization remains a significant challenge for existing algorithms. In this paper, we propose a real-time omnidirectional depth estimation method for edge computing platforms named Rt-OmniMVS, which introduces the Combined Spherical Sweeping method and implements the lightweight network structure to achieve real-time performance on edge computing platforms. To achieve high accuracy, robustness, and generalization in real-world environments, we introduce a teacher-student learning strategy. We leverage the high-precision stereo matching method as the teacher model to predict pseudo labels for unlabeled real-world data, and utilize data and model augmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Industrial Vision Systems and Defect Detection · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings