ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses
Junjie Ni, Guofeng Zhang, Guanglin Li, Yijin Li, Xinyang Liu, Zhaoyang, Huang, Hujun Bao

TL;DR
This paper introduces an efficient transformer-based local feature matching method that constructs multiple homography hypotheses and uses uni-directional cross-attention, achieving high accuracy with four times faster inference than existing methods.
Contribution
The proposed architecture combines multiple homography hypotheses with uni-directional cross-attention to improve efficiency and accuracy in local feature matching.
Findings
Achieves competitive accuracy with LoFTR on YFCC100M.
Boosts inference speed to 4 times faster than state-of-the-art transformer methods.
Demonstrates effectiveness across multiple datasets like Megadepth, ScanNet, and HPatches.
Abstract
We tackle the efficiency problem of learning local feature matching. Recent advancements have given rise to purely CNN-based and transformer-based approaches, each augmented with deep learning techniques. While CNN-based methods often excel in matching speed, transformer-based methods tend to provide more accurate matches. We propose an efficient transformer-based network architecture for local feature matching. This technique is built on constructing multiple homography hypotheses to approximate the continuous correspondence in the real world and uni-directional cross-attention to accelerate the refinement. On the YFCC100M dataset, our matching accuracy is competitive with LoFTR, a state-of-the-art transformer-based architecture, while the inference speed is boosted to 4 times, even outperforming the CNN-based methods. Comprehensive evaluations on other open datasets such as Megadepth,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Cancer-related molecular mechanisms research · Advanced Algorithms and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
