Multi-scale Alternated Attention Transformer for Generalized Stereo Matching
Wei Miao, Hong Zhao, Tongjia Chen, Wei Huang, Changyan Xiao

TL;DR
This paper introduces AAUformer, a novel multi-scale alternated attention transformer network that enhances stereo matching by balancing intra- and inter-view features, achieving state-of-the-art results and strong generalization.
Contribution
The paper proposes a new AAUformer architecture with window self-attention and multi-scale alternated attention for improved stereo matching and generalization.
Findings
Achieves state-of-the-art performance on Scene Flow dataset.
Performs competitively on KITTI 2015 after fine-tuning.
Outperforms existing methods in cross-dataset generalization.
Abstract
Recent stereo matching networks achieves dramatic performance by introducing epipolar line constraint to limit the matching range of dual-view. However, in complicated real-world scenarios, the feature information based on intra-epipolar line alone is too weak to facilitate stereo matching. In this paper, we present a simple but highly effective network called Alternated Attention U-shaped Transformer (AAUformer) to balance the impact of epipolar line in dual and single view respectively for excellent generalization performance. Compared to other models, our model has several main designs: 1) to better liberate the local semantic features of the single-view at pixel level, we introduce window self-attention to break the limits of intra-row self-attention and completely replace the convolutional network for denser features before cross-matching; 2) the multi-scale alternated attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Image Enhancement Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Label Smoothing · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding
