SPIRIT: Adapting Vision Foundation Models for Unified Single- and Multi-Frame Infrared Small Target Detection
Qian Xu, Xi Li, Fei Gao, Jie Guo, Haojuan Yuan, Shuaipeng Fan, Mingjin Zhang

TL;DR
SPIRIT is a unified framework that adapts vision foundation models for infrared small target detection, effectively handling both single-frame and video data by addressing modality gaps and enhancing target signals.
Contribution
The paper introduces SPIRIT, a novel VFM-compatible IRSTD framework with physics-informed plug-ins for spatial and temporal feature refinement, unifying single- and multi-frame detection.
Findings
Consistent performance improvements over baselines.
Effective suppression of background clutter.
Robust detection in both single- and multi-frame scenarios.
Abstract
Infrared small target detection (IRSTD) is crucial for surveillance and early-warning, with deployments spanning both single-frame analysis and video-mode tracking. A practical solution should leverage vision foundation models (VFMs) to mitigate infrared data scarcity, while adopting a memory-attention-based temporal propagation framework that unifies single- and multi-frame inference. However, infrared small targets exhibit weak radiometric signals and limited semantic cues, which differ markedly from visible-spectrum imagery. This modality gap makes direct use of semantics-oriented VFMs and appearance-driven cross-frame association unreliable for IRSTD: hierarchical feature aggregation can submerge localized target peaks, and appearance-only memory attention becomes ambiguous, leading to spurious clutter associations. To address these challenges, we propose SPIRIT, a unified and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
