Efficient Linear Attention for Fast and Accurate Keypoint Matching
Suwichaya Suwanwimolkul, Satoshi Komorita

TL;DR
This paper introduces an efficient linear attention mechanism for keypoint matching in 3D vision, significantly reducing computational complexity while maintaining high accuracy, and jointly learning features for faster matching.
Contribution
It proposes a novel linear attention method combined with attentional aggregation and joint learning of features, enabling faster and more efficient keypoint matching compared to existing methods.
Findings
Achieves competitive accuracy with fewer parameters.
Reduces computational complexity from quadratic to linear.
Outperforms larger models on multiple benchmarks.
Abstract
Recently Transformers have provided state-of-the-art performance in sparse matching, crucial to realize high-performance 3D vision applications. Yet, these Transformers lack efficiency due to the quadratic computational complexity of their attention mechanism. To solve this problem, we employ an efficient linear attention for the linear computational complexity. Then, we propose a new attentional aggregation that achieves high accuracy by aggregating both the global and local information from sparse keypoints. To further improve the efficiency, we propose the joint learning of feature matching and description. Our learning enables simpler and faster matching than Sinkhorn, often used in matching the learned descriptors from Transformers. Our method achieves competitive performance with only 0.84M learnable parameters against the bigger SOTAs, SuperGlue (12M parameters) and SGMNet (30M…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
