Identifying Bias in Machine-generated Text Detection
Kevin Stowe, Svetlana Afanaseva, Rodolfo Raimundo, Yitao Sun, Kailash Patil

TL;DR
This paper investigates biases in machine-generated text detection systems, revealing disparities across demographic groups and highlighting the importance of addressing fairness in detection models.
Contribution
The study systematically assesses biases in 16 detection systems across multiple attributes, providing insights into their fairness and limitations.
Findings
Several models classify disadvantaged groups as machine-generated.
ELL essays are more likely to be classified as machine-generated.
Humans perform poorly at detection but show no significant biases.
Abstract
The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
