AbHE: All Attention-based Homography Estimation
Mingxiao Huo, Zhihao Zhang, Xinyang Ren, Xianqiang Yang

TL;DR
This paper introduces AbHE, a novel transformer-based model for homography estimation that combines local and global features, outperforming existing CNN-based methods in accuracy.
Contribution
It proposes a Swin Transformer-based architecture with a cross non-local layer and attention-based correlation filtering for improved homography estimation.
Findings
Outperforms state-of-the-art CNN-based methods in 8 DOFs homography estimation.
Effectively combines local and global features using transformer and CNN components.
Demonstrates superior accuracy in experimental evaluations.
Abstract
Homography estimation is a basic computer vision task, which aims to obtain the transformation from multi-view images for image alignment. Unsupervised learning homography estimation trains a convolution neural network for feature extraction and transformation matrix regression. While the state-of-theart homography method is based on convolution neural networks, few work focuses on transformer which shows superiority in highlevel vision tasks. In this paper, we propose a strong-baseline model based on the Swin Transformer, which combines convolution neural network for local features and transformer module for global features. Moreover, a cross non-local layer is introduced to search the matched features within the feature maps coarsely. In the homography regression stage, we adopt an attention layer for the channels of correlation volume, which can drop out some weak correlation feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Layer Normalization · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Stochastic Depth · Convolution
