WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning
Chunjin Yang, Xiwei Zhang, Yiming Xiao, Fanman Meng

TL;DR
WD-FQDet is a multispectral detection transformer that leverages wavelet decomposition and frequency-aware query learning to improve infrared-visible object detection by explicitly modeling frequency domain features.
Contribution
The paper introduces a novel framework that decouples and fuses frequency-specific features from multispectral images, enhancing detection performance.
Findings
Achieves state-of-the-art results on FLIR, LLVIP, and M3FD datasets.
Effectively aligns shared features and preserves modality-specific features.
Dynamic frequency-aware query regulation improves detection across scenarios.
Abstract
Infrared-visible object detection improves detection performance by combining complementary features from multispectral images. Existing backbone-specific and backbone-shared approaches still suffer from the problems of severe bias of modality-shared features and the insufficiency of modality-specific features. To address these issues, we propose a novel detection framework WD-FQDet that explicitly decouples modality-shared and modality-specific information from infrared and visible modalities in the new view of low- and high-frequency domains, allowing fusion strategies tailored to their frequency characteristics. Specifically, a low-frequency homogeneity alignment module is proposed to align modality-shared features across modalities via a cross-modal attention mechanism, and a high-frequency specificity retention module is proposed to preserve modality-specific features through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
