End-to-End Learning of Geometry and Context for Deep Stereo Regression
Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan, Kennedy, Abraham Bachrach, Adam Bry

TL;DR
This paper introduces a deep learning approach for stereo disparity estimation that uses geometric knowledge, a cost volume, and 3D convolutions to achieve high accuracy and speed without post-processing.
Contribution
It presents a novel end-to-end deep learning architecture utilizing a differentiable soft argmin for disparity regression, improving accuracy and efficiency over previous methods.
Findings
Achieved state-of-the-art results on KITTI dataset.
Significantly faster inference compared to existing methods.
End-to-end training without post-processing or regularization.
Abstract
We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem's geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
End-to-End Learning of Geometry and Context for Deep Stereo Regression· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques
