Affine-based Deformable Attention and Selective Fusion for Semi-dense   Matching

Hongkai Chen; Zixin Luo; Yurun Tian; Xuyang Bai; Ziyu Wang; Lei Zhou,; Mingmin Zhen; Tian Fang; David McKinnon; Yanghai Tsin; Long Quan

arXiv:2405.13874·cs.CV·May 24, 2024

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Hongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou,, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan

PDF

Open Access

TL;DR

This paper introduces affine-based deformable attention and selective fusion techniques to improve semi-dense image matching, achieving state-of-the-art results with reduced computational cost and fewer parameters.

Contribution

It proposes novel affine-based local attention and selective fusion mechanisms, along with a spatial smoothness loss, enhancing semi-dense matching performance and efficiency.

Findings

01

Full network achieves state-of-the-art semi-dense matching performance.

02

Slim version reaches LoFTR baseline with 15% computation and 18% parameters.

03

Network demonstrates strong matching capacity across different settings.

Abstract

Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from cross attention. Apart from network structure, we also identify the importance of enforcing spatial smoothness in loss design, which has been omitted by previous works. Based on these augmentations, our network demonstrate strong matching capacity under different settings. The full version of our network achieves state-of-the-art performance among semi-dense matching methods at a similar cost to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections