DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery
Junming Zhang, Katherine A. Skinner, Ram Vasudevan, Matthew, Johnson-Roberson

TL;DR
DispSegNet is an end-to-end CNN that jointly estimates disparity and semantic segmentation, using a two-stage refinement and unsupervised training, significantly improving accuracy in challenging regions for autonomous driving.
Contribution
The paper introduces a novel CNN architecture that couples disparity estimation with semantic segmentation, including a two-stage refinement and unsupervised training approach.
Findings
Achieves state-of-the-art results on KITTI and Cityscapes datasets.
Leverages semantic embeddings to improve disparity accuracy.
Capable of real-time disparity and semantic label output.
Abstract
Recent work has shown that convolutional neural networks (CNNs) can be applied successfully in disparity estimation, but these methods still suffer from errors in regions of low-texture, occlusions and reflections. Concurrently, deep learning for semantic segmentation has shown great progress in recent years. In this paper, we design a CNN architecture that combines these two tasks to improve the quality and accuracy of disparity estimation with the help of semantic segmentation. Specifically, we propose a network structure in which these two tasks are highly coupled. One key novelty of this approach is the two-stage refinement process. Initial disparity estimates are refined with an embedding learned from the semantic segmentation branch of the network. The proposed model is trained using an unsupervised approach, in which images from one half of the stereo pair are warped and compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
