Harm Amplification in Text-to-Image Models
Susan Hao, Renee Shelby, Yuchi Liu, Hansa Srinivasan, Mukul Bhutani,, Burcu Karagol Ayan, Ryan Poplin, Shivani Poddar, Sarah Laszlo

TL;DR
This paper investigates harm amplification in text-to-image models, formalizing its definition, proposing methodologies to measure it, and empirically analyzing its impact across different user groups to enhance safety in generative AI.
Contribution
It introduces a formal definition of harm amplification, develops a framework for quantification, and empirically assesses its impact, especially regarding gender disparities, in T2I models.
Findings
Harm amplification can lead to increased harmful outputs beyond user input.
Methodologies enable quantification of harm amplification in various scenarios.
Disparate impacts, such as gender biases, are measurable through proposed frameworks.
Abstract
Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input prompt, poses a potentially greater risk than adversarial prompts, leaving users unintentionally exposed to harms. Our paper addresses this issue by formalizing a definition for this phenomenon which we term harm amplification. We further contribute to the field by developing a framework of methodologies to quantify harm amplification in which we consider the harm of the model output in the context of user input. We then empirically examine how to apply these different methodologies to simulate real-world deployment scenarios including a quantification of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
