TAR: Text Semantic Assisted Cross-modal Image Registration Framework for Optical and SAR Images
Zhuoyu Cai, Dou Quan, Ning Huyan, Pei He, Shuang Wang, Licheng Jiao

TL;DR
This paper introduces TAR, a novel framework that leverages text semantic priors to improve cross-modal image registration between optical and SAR images, especially under large deformations.
Contribution
TAR integrates text semantic priors with visual features to enhance cross-modal registration, addressing appearance discrepancies and complex spatial transformations.
Findings
TAR outperforms state-of-the-art methods in cross-modal registration accuracy.
The framework achieves significant improvements under large geometric deformations.
Experimental results validate the effectiveness of text-assisted feature enhancement.
Abstract
Existing deep learning-based methods can capture shared features from optical and synthetic aperture radar (SAR) images for spatial alignment. However, optical-SAR registration remains challenging under large geometric deformations, because the model needs to simultaneously handle cross-modal appearance discrepancies and complex spatial transformations. To address this issue, this paper proposes a text semantic-assisted cross-modal image registration framework, named TAR, for optical and SAR images. TAR exploits text semantic priors from remote sensing scenes and land-cover categories to alleviate the modality gap and enhance cross-modal feature learning. TAR consists of three components: a multi-scale visual feature learning (MSFL) module, a text-assisted feature enhancement (TAFE) module, and a coarse-to-fine dense matching (CFDM) module. MSFL extracts multi-scale visual features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
