AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection

Dingkun Zhu; Haote Zhang; Lipeng Gu; Wuzhou Quan; Fu Lee Wang; Honghui Fan; Jiali Tang; Haoran Xie; Xiaoping Zhang; and Mingqiang Wei

arXiv:2507.20146·cs.CV·December 29, 2025

AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection

Dingkun Zhu, Haote Zhang, Lipeng Gu, Wuzhou Quan, Fu Lee Wang, Honghui Fan, Jiali Tang, Haoran Xie, Xiaoping Zhang, and Mingqiang Wei

PDF

TL;DR

AlignFreeNet introduces an end-to-end, alignment-free approach for visible-infrared object detection that effectively handles severe misalignments by leveraging frequency-domain fusion and adaptive compensation, outperforming alignment-based methods.

Contribution

This paper presents a novel alignment-free network with frequency-guided fusion and cross-modal compensation, avoiding explicit alignment and improving robustness in misaligned conditions.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Effectively mitigates severe cross-modal misalignments.

03

Demonstrates robustness and generalization in real-world scenarios.

Abstract

Cross-modal misalignments, such as spatial offsets, resolution discrepancies, and semantic deficiencies, frequently occur in visible-infrared object detection (VI-OD). To mitigate this, existing methods are typically adapted into an alignment-based fusion paradigm, in which an explicit pixel- or feature-level alignment module is inserted before cross-modal fusion. However, pixel-level alignment struggles to cope with severe or mixed misalignments, whereas feature-level alignment often introduces undesirable noise into fused representations under such conditions, ultimately limiting detection performance. In this paper, we propose a novel alignment-free network (AlignFreeNet) for VI-OD. Differing from prior methods, AlignFreeNet abandons any explicit alignment and instead adopts an alignment-free fusion paradigm. Specifically, AlignFreeNet comprises two core modules: variation-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.