ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Xin Yao; Haiyang Zhao; Yimin Chen; Jiawei Guo; Kecheng Huang; Ming Zhao

arXiv:2511.00446·cs.CV·November 4, 2025

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Xin Yao, Haiyang Zhao, Yimin Chen, Jiawei Guo, Kecheng Huang, Ming Zhao

PDF

Open Access

TL;DR

ToxicTextCLIP introduces a method to generate adversarial texts that can poison or backdoor CLIP models during pre-training, exposing vulnerabilities in the model's reliance on web-sourced data.

Contribution

It presents a novel framework for creating high-quality adversarial texts targeting CLIP, addressing challenges of semantic misalignment and background consistency.

Findings

01

Achieves up to 95.83% poisoning success rate

02

Reaches 98.68% backdoor Hit@1 accuracy

03

Successfully bypasses existing defenses like RoCLIP, CleanCLIP, SafeCLIP

Abstract

The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from large-scale web data through self-supervised contrastive learning. Yet, its reliance on uncurated Internet-sourced data exposes it to data poisoning and backdoor risks. While existing studies primarily investigate image-based attacks, the text modality, which is equally central to CLIP's training, remains underexplored. In this work, we introduce ToxicTextCLIP, a framework for generating high-quality adversarial texts that target CLIP during the pre-training phase. The framework addresses two key challenges: semantic misalignment caused by background inconsistency with the target class, and the scarcity of background-consistent texts. To this end, ToxicTextCLIP iteratively applies: 1) a background-aware selector that prioritizes texts with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Hate Speech and Cyberbullying Detection