Espresso: Robust Concept Filtering in Text-to-Image Models
Anudeep Das, Vasisht Duddu, Rui Zhang, N. Asokan

TL;DR
Espresso is a novel CLIP-based concept filtering method for text-to-image models that effectively prevents unacceptable image concepts while maintaining utility and robustness against adversarial prompts.
Contribution
We introduce Espresso, the first robust concept filter based on CLIP that improves effectiveness, robustness, and utility preservation in concept removal for text-to-image models.
Findings
Espresso outperforms prior CRTs in effectiveness.
Espresso demonstrates higher robustness against adversarial prompts.
Espresso maintains high utility for acceptable concepts.
Abstract
Diffusion based text-to-image models are trained on large datasets scraped from the Internet, potentially containing unacceptable concepts (e.g., copyright-infringing or unsafe). We need concept removal techniques (CRTs) which are i) effective in preventing the generation of images with unacceptable concepts, ii) utility-preserving on acceptable concepts, and, iii) robust against evasion with adversarial prompts. No prior CRT satisfies all these requirements simultaneously. We introduce Espresso, the first robust concept filter based on Contrastive Language-Image Pre-Training (CLIP). We identify unacceptable concepts by using the distance between the embedding of a generated image to the text embeddings of both unacceptable and acceptable concepts. This lets us fine-tune for robustness by separating the text embeddings of unacceptable and acceptable concepts while preserving utility. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Image Retrieval and Classification Techniques · Machine Learning and Data Classification
MethodsContrastive Language-Image Pre-training
