TL;DR
This paper introduces a method to generate stereo training data from single images using monocular depth estimation, reducing the need for ground truth stereo data and enabling training on diverse datasets like COCO.
Contribution
It presents a novel pipeline that creates stereo training pairs from single images, allowing effective stereo network training without relying on real or synthetic stereo data.
Findings
Outperforms traditional stereo training methods on KITTI, ETH3D, Middlebury datasets.
Enables training on diverse datasets like COCO for stereo matching.
Reduces human effort and data collection requirements.
Abstract
Supervised deep networks are among the best methods for finding correspondences in stereo image pairs. Like all supervised approaches, these networks require ground truth data during training. However, collecting large quantities of accurate dense correspondence data is very challenging. We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs. Inspired by recent progress in monocular depth estimation, we generate plausible disparity maps from single images. In turn, we use those flawed disparity maps in a carefully designed pipeline to generate stereo training pairs. Training in this manner makes it possible to convert any collection of single RGB images into stereo training data. This results in a significant reduction in human effort, with no need to collect real depths or to hand-design synthetic data. We can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
