MTTM: Metamorphic Testing for Textual Content Moderation Software
Wenxuan Wang, Jen-tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang,, Shuqing Li, Pinjia He, Michael Lyu

TL;DR
This paper introduces MTTM, a metamorphic testing framework for evaluating and improving textual content moderation software's ability to detect toxic content, especially when malicious users make minimal modifications.
Contribution
The paper presents a novel metamorphic testing approach for textual moderation software, including relations and perturbations, and demonstrates its effectiveness in revealing vulnerabilities and enhancing model robustness.
Findings
MTTM detects up to 83.9% errors in commercial moderation tools.
It achieves up to 91.2% error detection on state-of-the-art algorithms.
Retraining with MTTM-generated cases improves model robustness significantly.
Abstract
The exponential growth of social media platforms such as Twitter and Facebook has revolutionized textual communication and textual content publication in human society. However, they have been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisement, and pornography, which can lead to highly negative impacts (e.g., harmful effects on teen mental health). Researchers and practitioners have been enthusiastically developing and extensively deploying textual content moderation software to address this problem. However, we find that malicious users can evade moderation by changing only a few words in the toxic content. Moreover, modern content moderation software performance against malicious inputs remains underexplored. To this end, we propose MTTM, a Metamorphic Testing framework for Textual content Moderation software. Specifically, we conduct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Software Testing and Debugging Techniques · Software Engineering Research
MethodsTest
