FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models

Jaechul Roh; Andrew Yuan; Jinsong Mao

arXiv:2412.18302·cs.CV·December 25, 2024

FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models

Jaechul Roh, Andrew Yuan, Jinsong Mao

PDF

Open Access

TL;DR

FameBias is a novel attack method that manipulates input embeddings in text-to-image models to generate images of specific public figures, raising concerns about bias and misuse without retraining the models.

Contribution

This paper introduces FameBias, a new input embedding manipulation technique for T2I models that does not require additional training, enabling targeted biasing of generated images.

Findings

01

High attack success rate in generating targeted images.

02

Preserves semantic meaning of prompts.

03

Effective across multiple trigger-target pairs.

Abstract

Text-to-Image (T2I) diffusion models have rapidly advanced, enabling the generation of high-quality images that align closely with textual descriptions. However, this progress has also raised concerns about their misuse for propaganda and other malicious activities. Recent studies reveal that attackers can embed biases into these models through simple fine-tuning, causing them to generate targeted imagery when triggered by specific phrases. This underscores the potential for T2I models to act as tools for disseminating propaganda, producing images aligned with an attacker's objective for end-users. Building on this concept, we introduce FameBias, a T2I biasing attack that manipulates the embeddings of input prompts to generate images featuring specific public figures. Unlike prior methods, Famebias operates solely on the input embedding vectors without requiring additional model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection

MethodsDiffusion · ALIGN