ReDFeat: Recoupling Detection and Description for Multimodal Feature Learning
Yuxin Deng, Jiayi Ma

TL;DR
ReDFeat introduces a stable, recoupled training framework for multimodal local feature detection and description, utilizing a mutual weighting strategy and a novel Super Detector to improve cross-modal matching performance.
Contribution
The paper proposes a new recoupling strategy with detached weights and a Super Detector for stable end-to-end training of multimodal features, outperforming previous methods.
Findings
ReDFeat surpasses state-of-the-art in cross-modal feature matching.
The recoupled training framework improves stability and performance.
The model can be trained from scratch without pre-training.
Abstract
Deep-learning-based local feature extraction algorithms that combine detection and description have made significant progress in visible image matching. However, the end-to-end training of such frameworks is notoriously unstable due to the lack of strong supervision of detection and the inappropriate coupling between detection and description. The problem is magnified in cross-modal scenarios, in which most methods heavily rely on the pre-training. In this paper, we recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy, in which the detected probabilities of robust features are forced to peak and repeat, while features with high detection scores are emphasized during optimization. Different from previous works, those weights are detached from back propagation so that the detected probability of indistinct features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
