An Image is Worth a Thousand Toxic Words: A Metamorphic Testing   Framework for Content Moderation Software

Wenxuan Wang; Jingyuan Huang; Jen-tse Huang; Chang Chen; Jiazhen Gu,; Pinjia He; Michael R. Lyu

arXiv:2308.09810·cs.SE·August 22, 2023

An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

Wenxuan Wang, Jingyuan Huang, Jen-tse Huang, Chang Chen, Jiazhen Gu,, Pinjia He, Michael R. Lyu

PDF

Open Access

TL;DR

This paper introduces OASIS, a metamorphic testing framework that generates toxic image test cases to evaluate and improve the robustness of content moderation software against evasion tactics.

Contribution

OASIS is the first metamorphic testing framework specifically designed for content moderation, leveraging real-world toxic content to identify vulnerabilities in commercial and research models.

Findings

01

OASIS achieves up to 100% error detection rate on tested models.

02

Retraining with OASIS-generated data enhances model robustness.

03

The framework uncovers significant vulnerabilities in current moderation tools.

Abstract

The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts in developing and deploying textual and image content moderation methods, malicious users can evade moderation by embedding texts into images, such as screenshots of the text, usually with some interference. We find that modern content moderation software's performance against such malicious inputs remains underexplored. In this work, we propose OASIS, a metamorphic testing framework for content moderation software. OASIS employs 21 transform rules summarized from our pilot study on 5,000…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques

MethodsOASIS