TL;DR
This paper explores how biases in algorithmic content moderation systems reflect normative differences across demographic groups, affecting fairness and diversity in online discussions.
Contribution
It introduces methods to measure normative biases in moderation algorithms and demonstrates their application through a case study using demographic-labeled comment data.
Findings
Classifiers trained on different demographic labels show varying performance on test sets.
Normative biases influence moderation outcomes and can impact diversity of online discourse.
Bias measurement methods can inform more equitable content moderation practices.
Abstract
The internet has become a central medium through which `networked publics' express their opinions and engage in debate. Offensive comments and personal attacks can inhibit participation in these spaces. Automated content moderation aims to overcome this problem using machine learning classifiers trained on large corpora of texts manually annotated for offence. While such systems could help encourage more civil debate, they must navigate inherently normatively contestable boundaries, and are subject to the idiosyncratic norms of the human raters who provide the training data. An important objective for platforms implementing such measures might be to ensure that they are not unduly biased towards or against particular norms of offence. This paper provides some exploratory methods by which the normative biases of algorithmic content moderation systems can be measured, by way of a case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
