Algorithmic Arbitrariness in Content Moderation
Juan Felipe Gomez, Caio Vieira Machado, Lucas Monteiro Paes and, Flavio P. Calmon

TL;DR
This paper investigates how algorithmic arbitrariness in machine learning-based content moderation can lead to inconsistent and potentially unjust restrictions on speech, highlighting risks to human rights and the need for increased transparency.
Contribution
It experimentally demonstrates predictive multiplicity in state-of-the-art models and analyzes its impact on social groups and human rights in content moderation.
Findings
Predictive multiplicity causes arbitrary toxicity classifications.
Arbitrariness disproportionately affects certain social groups.
Model multiplicity can be more ambiguous than human judgment.
Abstract
Machine learning (ML) is widely used to moderate online content. Despite its scalability relative to human moderation, the use of ML introduces unique challenges to content moderation. One such challenge is predictive multiplicity: multiple competing models for content classification may perform equally well on average, yet assign conflicting predictions to the same content. This multiplicity can result from seemingly innocuous choices during model development, such as random seed selection for parameter initialization. We experimentally demonstrate how content moderation tools can arbitrarily classify samples as toxic, leading to arbitrary restrictions on speech. We discuss these findings in terms of human rights set out by the International Covenant on Civil and Political Rights (ICCPR), namely freedom of expression, non-discrimination, and procedural justice. We analyze (i) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsSparse Evolutionary Training
