Exploring the Boundaries of Content Moderation in Text-to-Image   Generation

Piera Riccio; Georgina Curto; Nuria Oliver

arXiv:2409.17155·cs.CY·September 27, 2024

Exploring the Boundaries of Content Moderation in Text-to-Image Generation

Piera Riccio, Georgina Curto, Nuria Oliver

PDF

Open Access

TL;DR

This study evaluates the alignment between safety guidelines and actual content moderation in text-to-image models, revealing discrepancies and advocating for transparency and inclusive policies to address societal impacts.

Contribution

It provides an empirical analysis of safety guideline adherence and model behavior, highlighting the gap between policies and real-world outputs in T2I platforms.

Findings

01

Discrepancy between safety guidelines and model outputs

02

Evidence of over-censorship in content moderation

03

Need for transparency and inclusive moderation practices

Abstract

This paper analyzes the community safety guidelines of five text-to-image (T2I) generation platforms and audits five T2I models, focusing on prompts related to the representation of humans in areas that might lead to societal stigma. While current research primarily focuses on ensuring safety by restricting the generation of harmful content, our study offers a complementary perspective. We argue that the concept of safety is difficult to define and operationalize, reflected in a discrepancy between the officially published safety guidelines and the actual behavior of the T2I models, and leading at times to over-censorship. Our findings call for more transparency and an inclusive dialogue about the platforms' content moderation practices, bearing in mind their global cultural and social impact.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection