CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal; Saket Anand; Chetan Arora; Vikram Goyal

arXiv:2604.09101·cs.CR·April 13, 2026

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal

PDF

TL;DR

This paper introduces CLIP-Inspector, a model-level backdoor detection method for prompt-tuned CLIP models, capable of reconstructing triggers and verifying model integrity using out-of-distribution images.

Contribution

It presents a novel backdoor detection approach that reconstructs triggers and verifies prompt-tuned CLIP models, addressing limitations of existing encoder-focused methods.

Findings

01

Achieves 94% detection accuracy across ten datasets and four backdoor attacks.

02

Reconstructs effective triggers in a single epoch with only 1,000 OOD images.

03

Outperforms existing trigger-inversion baselines with higher AUROC scores.

Abstract

Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.