SPIRIT: Adapting Vision Foundation Models for Unified Single- and Multi-Frame Infrared Small Target Detection

Qian Xu; Xi Li; Fei Gao; Jie Guo; Haojuan Yuan; Shuaipeng Fan; Mingjin Zhang

arXiv:2602.01843·cs.CV·February 3, 2026

SPIRIT: Adapting Vision Foundation Models for Unified Single- and Multi-Frame Infrared Small Target Detection

Qian Xu, Xi Li, Fei Gao, Jie Guo, Haojuan Yuan, Shuaipeng Fan, Mingjin Zhang

PDF

Open Access

TL;DR

SPIRIT is a unified framework that adapts vision foundation models for infrared small target detection, effectively handling both single-frame and video data by addressing modality gaps and enhancing target signals.

Contribution

The paper introduces SPIRIT, a novel VFM-compatible IRSTD framework with physics-informed plug-ins for spatial and temporal feature refinement, unifying single- and multi-frame detection.

Findings

01

Consistent performance improvements over baselines.

02

Effective suppression of background clutter.

03

Robust detection in both single- and multi-frame scenarios.

Abstract

Infrared small target detection (IRSTD) is crucial for surveillance and early-warning, with deployments spanning both single-frame analysis and video-mode tracking. A practical solution should leverage vision foundation models (VFMs) to mitigate infrared data scarcity, while adopting a memory-attention-based temporal propagation framework that unifies single- and multi-frame inference. However, infrared small targets exhibit weak radiometric signals and limited semantic cues, which differ markedly from visible-spectrum imagery. This modality gap makes direct use of semantics-oriented VFMs and appearance-driven cross-frame association unreliable for IRSTD: hierarchical feature aggregation can submerge localized target peaks, and appearance-only memory attention becomes ambiguous, leading to spurious clutter associations. To address these challenges, we propose SPIRIT, a unified and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies · Video Surveillance and Tracking Methods · Advanced Neural Network Applications