TL;DR
BiFuse++ introduces a novel self-supervised framework combining bi-projection fusion with a new fusion module and contrast-aware loss, significantly improving 360 depth estimation from monocular videos with reduced data collection costs.
Contribution
The paper presents BiFuse++, the first to integrate bi-projection fusion into self-training for monocular 360 depth estimation, enhancing performance and stability.
Findings
Achieves state-of-the-art results on benchmark datasets.
Improves self-training stability with Contrast-Aware Photometric Loss.
Reduces data collection costs for 360 depth estimation.
Abstract
Due to the rise of spherical cameras, monocular 360 depth estimation becomes an important technique for many applications (e.g., autonomous systems). Thus, state-of-the-art frameworks for monocular 360 depth estimation such as bi-projection fusion in BiFuse are proposed. To train such a framework, a large number of panoramas along with the corresponding depth ground truths captured by laser sensors are required, which highly increases the cost of data collection. Moreover, since such a data collection procedure is time-consuming, the scalability of extending these methods to different scenes becomes a challenge. To this end, self-training a network for monocular depth estimation from 360 videos is one way to alleviate this issue. However, there are no existing frameworks that incorporate bi-projection fusion into the self-training scheme, which highly limits the self-supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
