Towards Generalized Multimodal Homography Estimation

Jinkun You; Jiaxin Cheng; Jie Zhang; Yicong Zhou

arXiv:2603.03956·cs.CV·March 5, 2026

Towards Generalized Multimodal Homography Estimation

Jinkun You, Jiaxin Cheng, Jie Zhang, Yicong Zhou

PDF

Open Access

TL;DR

This paper introduces a novel training data synthesis method and a specialized network to enhance the robustness and generalization of homography estimation across different modalities and domains.

Contribution

It presents a new data synthesis approach generating diverse, structurally preserved image pairs from a single image, and a network that leverages cross-scale info and decouples color for better accuracy.

Findings

01

Synthetic data improves cross-domain generalization.

02

The proposed network outperforms existing methods.

03

Enhanced robustness across unseen modalities.

Abstract

Supervised and unsupervised homography estimation methods depend on image pairs tailored to specific modalities to achieve high accuracy. However, their performance deteriorates substantially when applied to unseen modalities. To address this issue, we propose a training data synthesis method that generates unaligned image pairs with ground-truth offsets from a single input image. Our approach renders the image pairs with diverse textures and colors while preserving their structural information. These synthetic data empower the trained model to achieve greater robustness and improved generalization across various domains. Additionally, we design a network to fully leverage cross-scale information and decouple color information from feature representations, thus improving estimation accuracy. Extensive experiments show that our training data synthesis method improves generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Digital Media Forensic Detection