FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer
Xinyu Zhang, Li Wang, Zhiqiang Jiang, Kun Dai, Tao Xie, Lei Yang,, Wenhao Yu, Yang Shen, Jun Li

TL;DR
FMRT introduces a novel Transformer-based approach that adaptively reconciles multi-scale features and enhances positional encoding, significantly improving local feature matching accuracy in various computer vision tasks.
Contribution
The paper presents FMRT, a new detector-free Transformer method with a Reconciliatory Transformer that adaptively integrates multi-scale features and reliable positional encoding.
Findings
FMRT outperforms existing methods on multiple benchmarks.
It achieves higher accuracy in pose estimation and visual localization.
The approach demonstrates robustness across diverse computer vision tasks.
Abstract
Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding
