Unsupervised Stereo via Multi-Baseline Geometry-Consistent Self-Training
Peng Xu, Zhiyu Xiang, Tingming Bai, Tianyu Pu, Kai Wang, Chaojie Ji, Zhihao Yang, Eryun Liu

TL;DR
This paper introduces S$^3$, a novel self-training framework for stereo networks that leverages multi-baseline geometry consistency and asymmetric target views to improve supervision, especially in occluded regions.
Contribution
The paper proposes S$^3$, a new self-training method using multi-baseline geometry and asymmetric views, enhancing supervision in occluded areas for stereo depth estimation.
Findings
Outperforms previous methods on KITTI benchmarks
Effective in occluded and non-occluded regions
Introduces MBS20K dataset for training
Abstract
Photometric loss and pseudo-label-based self-training are two widely used methods for training stereo networks on unlabeled data. However, they both struggle to provide accurate supervision in occluded regions. The former lacks valid correspondences, while the latter's pseudo labels are often unreliable. To overcome these limitations, we present S, a simple yet effective framework based on multi-baseline geometry consistency. Unlike conventional self-training where teacher and student share identical stereo pairs, S assigns them different target images, introducing natural visibility asymmetry. Regions occluded in the student's view often remain visible and matchable to the teacher, enabling reliable pseudo labels even in regions where photometric supervision fails. The teacher's disparities are rescaled to align with the student's baseline and used to guide student learning. An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
