LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching
Wenhao Zhong, Jie Jiang

TL;DR
LGFCTR introduces a convolutional transformer that effectively captures both local and global features for robust image matching, outperforming existing methods across various benchmarks.
Contribution
The paper proposes a novel convolutional transformer architecture that combines local convolutions with global self-attention for improved image matching accuracy.
Findings
Achieves superior performance on multiple image matching benchmarks.
Effectively captures multi-scale long-range dependencies.
Enhances locality through a novel multi-scale attention mechanism.
Abstract
Image matching that finding robust and accurate correspondences across images is a challenging task under extreme conditions. Capturing local and global features simultaneously is an important way to mitigate such an issue but recent transformer-based decoders were still stuck in the issues that CNN-based encoders only extract local features and the transformers lack locality. Inspired by the locality and implicit positional encoding of convolutions, a novel convolutional transformer is proposed to capture both local contexts and global structures more sufficiently for detector-free matching. Firstly, a universal FPN-like framework captures global structures in self-encoder as well as cross-decoder by transformers and compensates local contexts as well as implicit positional encoding by convolutions. Secondly, a novel convolutional transformer module explores multi-scale long range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Medical Image Segmentation Techniques
