Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services
David Hartmann, Amin Oueslati, Dimitri Staufer

TL;DR
This study systematically evaluates four cloud-based content moderation services for bias and fairness, revealing performance disparities and challenges in detecting implicit hate speech, emphasizing the need for greater transparency and bias mitigation.
Contribution
It provides a comprehensive third-party audit of commercial moderation services, highlighting biases and performance issues, especially in implicit hate speech detection and group fairness.
Findings
All services struggled with implicit hate speech detection.
Biases towards LGBTQ+ and PoC groups persist.
Biases against women have been largely addressed.
Abstract
Online platforms face the challenge of moderating an ever-increasing volume of content, including harmful hate speech. In the absence of clear legal definitions and a lack of transparency regarding the role of algorithms in shaping decisions on content moderation, there is a critical need for external accountability. Our study contributes to filling this gap by systematically evaluating four leading cloud-based content moderation services through a third-party audit, highlighting issues such as biases against minorities and vulnerable groups that may arise through over-reliance on these services. Using a black-box audit approach and four benchmark data sets, we measure performance in explicit and implicit hate speech detection as well as counterfactual fairness through perturbation sensitivity analysis and present disparities in performance for certain target identity groups and data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Privacy, Security, and Data Protection · Freedom of Expression and Defamation
