CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal

TL;DR
This paper introduces CLIP-Inspector, a model-level backdoor detection method for prompt-tuned CLIP models, capable of reconstructing triggers and verifying model integrity using out-of-distribution images.
Contribution
It presents a novel backdoor detection approach that reconstructs triggers and verifies prompt-tuned CLIP models, addressing limitations of existing encoder-focused methods.
Findings
Achieves 94% detection accuracy across ten datasets and four backdoor attacks.
Reconstructs effective triggers in a single epoch with only 1,000 OOD images.
Outperforms existing trigger-inversion baselines with higher AUROC scores.
Abstract
Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
