XoFTR: Cross-modal Feature Matching Transformer
\"Onder Tuzcuo\u{g}lu, Aybora K\"oksal, Bu\u{g}ra Sofu, Sinan Kalkan,, A. Ayd{\i}n Alatan

TL;DR
XoFTR is a novel transformer-based approach for local feature matching between thermal infrared and visible images, effectively handling modality differences and viewpoint variations through pre-training, augmentation, and refined matching techniques.
Contribution
The paper introduces XoFTR, a cross-modal feature matching transformer that incorporates masked image modeling, pseudo-thermal augmentation, and a refined matching pipeline for improved thermal-visible image matching.
Findings
Outperforms existing methods on multiple benchmarks
Effective handling of modality, viewpoint, and scale differences
Provides a new comprehensive thermal-visible dataset
Abstract
We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
