Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles
Weimin Liu, Jiyuan Qiu, Wenjun Wang, Joshua H. Meng

TL;DR
This paper introduces ArticuSurDepth, a self-supervised surround-view depth estimation framework tailored for articulated vehicles, leveraging geometric consistency and structural priors to improve 3D perception in autonomous driving.
Contribution
It presents a novel self-supervised method that incorporates cross-view and cross-vehicle geometric constraints, specifically designed for articulated vehicles, with validation on a new dataset and benchmarks.
Findings
Achieved state-of-the-art depth estimation performance on multiple datasets.
Enhanced structural coherence through multi-view spatial context and surface normal constraints.
Demonstrated effectiveness on a newly established articulated vehicle dataset.
Abstract
Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors from vision foundation model. Specifically, we introduce multi-view spatial context enrichment strategy and a cross-view surface normal constraint to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
