TL;DR
This paper introduces a novel GPS-based loss function for monocular self-supervised depth estimation that ensures scale consistency and awareness, leveraging GPS data during training to improve depth accuracy without requiring GPS at inference.
Contribution
It proposes a dynamically-weighted GPS-to-Scale loss that enhances scale consistency in depth estimation by utilizing GPS data during training only, independent of scene or camera setup.
Findings
Improved scale-consistent depth estimation demonstrated across multiple datasets.
Enhanced depth accuracy even with low-frequency GPS data during training.
The method does not require GPS at inference, making it practical for real-world applications.
Abstract
Dense depth estimation is essential to scene-understanding for autonomous driving. However, recent self-supervised approaches on monocular videos suffer from scale-inconsistency across long sequences. Utilizing data from the ubiquitously copresent global positioning systems (GPS), we tackle this challenge by proposing a dynamically-weighted GPS-to-Scale (g2s) loss to complement the appearance-based losses. We emphasize that the GPS is needed only during the multimodal training, and not at inference. The relative distance between frames captured through the GPS provides a scale signal that is independent of the camera setup and scene distribution, resulting in richer learned feature representations. Through extensive evaluation on multiple datasets, we demonstrate scale-consistent and -aware depth estimation during inference, improving the performance even when training with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGreedy Policy Search
