MTTM: Metamorphic Testing for Textual Content Moderation Software

Wenxuan Wang; Jen-tse Huang; Weibin Wu; Jianping Zhang; Yizhan Huang,; Shuqing Li; Pinjia He; Michael Lyu

arXiv:2302.05706·cs.CL·February 14, 2023·1 cites

MTTM: Metamorphic Testing for Textual Content Moderation Software

Wenxuan Wang, Jen-tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang,, Shuqing Li, Pinjia He, Michael Lyu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MTTM, a metamorphic testing framework for evaluating and improving textual content moderation software's ability to detect toxic content, especially when malicious users make minimal modifications.

Contribution

The paper presents a novel metamorphic testing approach for textual moderation software, including relations and perturbations, and demonstrates its effectiveness in revealing vulnerabilities and enhancing model robustness.

Findings

01

MTTM detects up to 83.9% errors in commercial moderation tools.

02

It achieves up to 91.2% error detection on state-of-the-art algorithms.

03

Retraining with MTTM-generated cases improves model robustness significantly.

Abstract

The exponential growth of social media platforms such as Twitter and Facebook has revolutionized textual communication and textual content publication in human society. However, they have been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisement, and pornography, which can lead to highly negative impacts (e.g., harmful effects on teen mental health). Researchers and practitioners have been enthusiastically developing and extensively deploying textual content moderation software to address this problem. However, we find that malicious users can evade moderation by changing only a few words in the toxic content. Moreover, modern content moderation software performance against malicious inputs remains underexplored. To this end, we propose MTTM, a Metamorphic Testing framework for Textual content Moderation software. Specifically, we conduct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jarviswang94/mttm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Software Testing and Debugging Techniques · Software Engineering Research

MethodsTest