MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching
Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li and, Renjie He

TL;DR
This paper introduces MSDC-Net, a novel deep learning architecture that combines multi-scale fusion and residual 3D convolutions to improve disparity map prediction accuracy in stereo matching tasks, especially in non-occluded regions.
Contribution
The paper proposes a new multi-scale dense and contextual network architecture for stereo disparity estimation, integrating multi-scale fusion and residual 3D convolutions for enhanced performance.
Findings
Outperforms existing methods on Scene Flow and KITTI datasets
Achieves higher accuracy in non-occluded regions
Demonstrates effective multi-scale feature fusion
Abstract
Disparity prediction from stereo images is essential to computer vision applications including autonomous driving, 3D model reconstruction, and object detection. To predict accurate disparity map, we propose a novel deep learning architecture for detectingthe disparity map from a rectified pair of stereo images, called MSDC-Net. Our MSDC-Net contains two modules: multi-scale fusion 2D convolution and multi-scale residual 3D convolution modules. The multi-scale fusion 2D convolution module exploits the potential multi-scale features, which extracts and fuses the different scale features by Dense-Net. The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module. Experimental results on Scene Flow and KITTI datasets demonstrate that our MSDC-Net significantly outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
Methods3D Convolution · Convolution
