DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance
Yinghui Xing, Xiaoting Su, Shizhou Zhang, Donghao Chu, Di Xu

TL;DR
DuGI-MAE introduces a dual-domain guidance approach with a high-entropy token masking strategy and a large infrared dataset, significantly improving infrared image understanding tasks over existing models.
Contribution
The paper presents DuGI-MAE, a novel infrared foundation model that employs dual-domain guidance and entropy-based masking, advancing infrared image interpretation beyond prior models.
Findings
Outperforms existing models in infrared object detection
Enhances semantic segmentation accuracy in infrared images
Demonstrates strong generalization across multiple infrared tasks
Abstract
Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked Autoencoder (MAE) trained on visible data perform suboptimal in infrared image interpretation tasks. To bridge this gap, an infrared foundation model known as InfMAE was developed and pre-trained on large-scale infrared datasets. Despite its effectiveness, InfMAE still faces several limitations, including the omission of informative tokens, insufficient modeling of global associations, and neglect of non-uniform noise. In this paper, we propose a Dual-domain Guided Infrared foundation model based on MAE (DuGI-MAE). First, we design a deterministic masking strategy based on token entropy, preserving only high-entropy tokens for reconstruction to enhance informativeness. Next, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Infrared Target Detection Methodologies · Image Enhancement Techniques
