BAID: A Benchmark for Bias Assessment of AI Detectors
Priyam Basu, Yunfeng Zhang, Vipul Raheja

TL;DR
This paper introduces BAID, a comprehensive framework for evaluating biases in AI-generated text detectors across diverse sociolinguistic factors, revealing significant disparities and emphasizing the need for bias-aware assessment.
Contribution
It presents BAID, a new scalable evaluation framework with over 200,000 samples to systematically assess biases in AI detectors across multiple sociolinguistic categories.
Findings
Detected consistent disparities in detection performance across groups.
Found low recall rates for texts from underrepresented groups.
Highlighted the importance of bias-aware evaluation for AI detectors.
Abstract
AI-generated text detectors have recently gained adoption in educational and professional contexts. Prior research has uncovered isolated cases of bias, particularly against English Language Learners (ELLs) however, there is a lack of systematic evaluation of such systems across broader sociolinguistic factors. In this work, we propose BAID, a comprehensive evaluation framework for AI detectors across various types of biases. As a part of the framework, we introduce over 200k samples spanning 7 major categories: demographics, age, educational grade level, dialect, formality, political leaning, and topic. We also generated synthetic versions of each sample with carefully crafted prompts to preserve the original content while reflecting subgroup-specific writing styles. Using this, we evaluate four open-source state-of-the-art AI text detectors and find consistent disparities in detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI · Computational and Text Analysis Methods
