Contrast-Guided Cross-Modal Distillation for Thermal Object Detection
SiWoo Kim, JhongHyun An

TL;DR
This paper proposes a training-only method for thermal object detection that enhances thermal features by aligning them with RGB features and sharpening decision boundaries, leading to improved night-time detection accuracy.
Contribution
It introduces a novel cross-modal distillation training approach that improves thermal detection without extra sensors or test-time fusion.
Findings
Outperforms prior methods in thermal object detection
Achieves state-of-the-art performance on benchmark datasets
Enhances thermal features using RGB-guided training objectives
Abstract
Robust perception at night remains challenging for thermal-infrared detection: low contrast and weak high-frequency cues lead to duplicate, overlapping boxes, missed small objects, and class confusion. Prior remedies either translate TIR to RGB and hope pixel fidelity transfers to detection -- making performance fragile to color or structure artifacts -- or fuse RGB and TIR at test time, which requires extra sensors, precise calibration, and higher runtime cost. Both lines can help in favorable conditions, but do not directly shape the thermal representation used by the detector. We keep mono-modality inference and tackle the root causes during training. Specifically, we introduce training-only objectives that sharpen instance-level decision boundaries by pulling together features of the same class and pushing apart those of different classes -- suppressing duplicate and confusing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Infrared Target Detection Methodologies · Thermography and Photoacoustic Techniques
