Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, and Dongyue, Chen

TL;DR
This paper introduces a novel method for open-vocabulary X-ray prohibited item detection by adapting CLIP with a specialized feature adapter, significantly improving detection of new categories in security scans.
Contribution
It extends CLIP with an X-ray feature adapter within an OVOD framework, effectively bridging domain gaps and enabling detection of novel prohibited items beyond trained categories.
Findings
Outperforms baseline OVOD methods on PIXray and PIDray datasets.
Achieves 15.2 AP50 improvement over previous best on PIXray.
Demonstrates effective domain adaptation for open-vocabulary X-ray detection.
Abstract
X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in limited real-world applications. Although the success of vision-language models (e.g. CLIP) provides a new perspectives for open-set X-ray prohibited item detection, directly applying CLIP to X-ray domain leads to a sharp performance drop due to domain shift between X-ray data and general data used for pre-training CLIP. To address aforementioned challenges, in this paper, we introduce distillation-based open-vocabulary object detection (OVOD) task into X-ray security inspection domain by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAdapter · Balanced Selection · Focus · Contrastive Language-Image Pre-training
