Multi-scale Alternated Attention Transformer for Generalized Stereo   Matching

Wei Miao; Hong Zhao; Tongjia Chen; Wei Huang; Changyan Xiao

arXiv:2308.03048·cs.CV·August 8, 2023

Multi-scale Alternated Attention Transformer for Generalized Stereo Matching

Wei Miao, Hong Zhao, Tongjia Chen, Wei Huang, Changyan Xiao

PDF

Open Access

TL;DR

This paper introduces AAUformer, a novel multi-scale alternated attention transformer network that enhances stereo matching by balancing intra- and inter-view features, achieving state-of-the-art results and strong generalization.

Contribution

The paper proposes a new AAUformer architecture with window self-attention and multi-scale alternated attention for improved stereo matching and generalization.

Findings

01

Achieves state-of-the-art performance on Scene Flow dataset.

02

Performs competitively on KITTI 2015 after fine-tuning.

03

Outperforms existing methods in cross-dataset generalization.

Abstract

Recent stereo matching networks achieves dramatic performance by introducing epipolar line constraint to limit the matching range of dual-view. However, in complicated real-world scenarios, the feature information based on intra-epipolar line alone is too weak to facilitate stereo matching. In this paper, we present a simple but highly effective network called Alternated Attention U-shaped Transformer (AAUformer) to balance the impact of epipolar line in dual and single view respectively for excellent generalization performance. Compared to other models, our model has several main designs: 1) to better liberate the local semantic features of the single-view at pixel level, we introduce window self-attention to break the limits of intra-row self-attention and completely replace the convolutional network for denser features before cross-matching; 2) the multi-scale alternated attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Image Enhancement Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Label Smoothing · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding