EDNet: Efficient Disparity Estimation with Cost Volume Combination and   Attention-based Spatial Residual

Songyan Zhang; Zhicheng Wang; Qiang Wang; Jinshuo Zhang; Gang Wei,; Xiaowen Chu

arXiv:2010.13338·cs.CV·March 5, 2021·1 cites

EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual

Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei,, Xiaowen Chu

PDF

Open Access

TL;DR

EDNet introduces an efficient disparity estimation network that combines contextual and similarity information in a novel volume, using attention-based residuals to improve accuracy while reducing memory and computation costs.

Contribution

The paper proposes EDNet, a disparity estimation model that uses a combined volume and attention-based residuals for faster, memory-efficient, and accurate disparity prediction.

Findings

01

Outperforms previous 3D CNN methods on Scene Flow and KITTI datasets.

02

Achieves state-of-the-art accuracy with faster inference speed.

03

Consumes less memory compared to existing approaches.

Abstract

Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed. In this paper, we propose a network named EDNet for efficient disparity estimation. Firstly, we construct a combined volume which incorporates contextual information from the squeezed concatenation volume and feature similarity measurement from the correlation volume. The combined volume can be next aggregated by 2D convolutions which are faster and require less memory than 3D convolutions. Secondly, we propose an attention-based spatial residual module to generate attention-aware residual features. The attention mechanism is applied to provide intuitive spatial evidence about inaccurate regions with the help of error maps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

Methods3 Dimensional Convolutional Neural Network · 3D Convolution · Convolution