Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

Yuanhang Yao; Ping Qian; Zhu Liu; Long Ma; Weimin Wang

arXiv:2605.14346·cs.CV·May 15, 2026

Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

Yuanhang Yao, Ping Qian, Zhu Liu, Long Ma, Weimin Wang

PDF

1 Repo

TL;DR

This paper introduces a hierarchical knowledge distillation framework with semantic conditioning to improve the stability and accuracy of point-supervised infrared small target detection, leveraging a frozen Vision Foundation Model.

Contribution

It proposes a novel bilevel optimization approach with semantic-conditioned affine modulation and collaborative learning to enhance pseudo-label quality and training stability.

Findings

01

Consistent improvements in detection accuracy across multiple backbones.

02

Enhanced training stability with pseudo-label noise mitigation.

03

Effective use of a frozen Vision Foundation Model as a semantic prior.

Abstract

Single-frame Infrared Small Target Detection (ISTD) aims to localize weak targets under heavy background clutter, yet dense pixel-wise annotations are expensive. Point supervision with online label evolution reduces annotation cost; however, lightweight CNN detectors often lack sufficient semantics, leading to noisy pseudo-masks and unstable optimization. To address this, we propose a hierarchical VFM-driven knowledge distillation framework that uses a frozen Vision Foundation Model (VFM) during training. We formulate point-supervised learning as a bilevel optimization process: the inner loop adapts a VFM-embedded teacher on reweighted training samples, while the outer loop transfers validation-guided knowledge to a lightweight student to mitigate pseudo-label noise and training-set bias. We further introduce Semantic-Conditioned Affine Modulation (SCAM) to inject VFM semantics into CNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuanhang-yao/semantic-prior
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.