EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual
Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei,, Xiaowen Chu

TL;DR
EDNet introduces an efficient disparity estimation network that combines contextual and similarity information in a novel volume, using attention-based residuals to improve accuracy while reducing memory and computation costs.
Contribution
The paper proposes EDNet, a disparity estimation model that uses a combined volume and attention-based residuals for faster, memory-efficient, and accurate disparity prediction.
Findings
Outperforms previous 3D CNN methods on Scene Flow and KITTI datasets.
Achieves state-of-the-art accuracy with faster inference speed.
Consumes less memory compared to existing approaches.
Abstract
Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed. In this paper, we propose a network named EDNet for efficient disparity estimation. Firstly, we construct a combined volume which incorporates contextual information from the squeezed concatenation volume and feature similarity measurement from the correlation volume. The combined volume can be next aggregated by 2D convolutions which are faster and require less memory than 3D convolutions. Secondly, we propose an attention-based spatial residual module to generate attention-aware residual features. The attention mechanism is applied to provide intuitive spatial evidence about inaccurate regions with the help of error maps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
Methods3 Dimensional Convolutional Neural Network · 3D Convolution · Convolution
