Smart Eyes for Silent Threats: VLMs and In-Context Learning for THz Imaging
Nicolas Poggi, Shashank Agnihotri, Margret Keuper

TL;DR
This paper explores the use of Vision-Language Models with In-Context Learning for THz imaging classification, demonstrating improved performance without fine-tuning in low-data scenarios.
Contribution
It introduces the first application of ICL-enhanced VLMs to THz imaging, adapting open-weight models with a modality-aligned prompting framework for zero-shot and one-shot classification.
Findings
ICL improves classification accuracy in low-data regimes
VLMs provide interpretable results for THz images
Zero-shot and one-shot settings are effective for THz domain
Abstract
Terahertz (THz) imaging enables non-invasive analysis for applications such as security screening and material classification, but effective image classification remains challenging due to limited annotations, low resolution, and visual ambiguity. We introduce In-Context Learning (ICL) with Vision-Language Models (VLMs) as a flexible, interpretable alternative that requires no fine-tuning. Using a modality-aligned prompting framework, we adapt two open-weight VLMs to the THz domain and evaluate them under zero-shot and one-shot settings. Our results show that ICL improves classification and interpretability in low-data regimes. This is the first application of ICL-enhanced VLMs to THz imaging, offering a promising direction for resource-constrained scientific domains. Code: \href{https://github.com/Nicolas-Poggi/Project_THz_Classification/tree/main}{GitHub repository}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
