How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models

Subhojit Ghimire

arXiv:2511.06676·cs.CL·April 2, 2026

How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models

Subhojit Ghimire

PDF

TL;DR

This paper quantifies bias in toxicity detection models against African-American English and introduces an interactive tool to demonstrate how biased thresholds operationalize discrimination.

Contribution

It provides a quantitative benchmark of bias in toxicity models and presents a pedagogical tool to illustrate the impact of bias in AI moderation.

Findings

01

The toxicity model scores African-American English as 1.8 times more toxic.

02

The model's bias is more pronounced in identity hate detection, 8.8 times higher.

03

The interactive tool demonstrates how human-set thresholds can perpetuate discrimination.

Abstract

Now that AI-driven moderation has become pervasive in everyday life, we often hear claims that "the AI is biased". While this is often said jokingly, the light-hearted remark reflects a deeper concern. How can we be certain that an online post flagged as "inappropriate" was not simply the victim of a biased algorithm? This paper investigates this problem using a dual approach. First, I conduct a quantitative benchmark of a widely used toxicity model (unitary/toxic-bert) to measure performance disparity between text in African-American English (AAE) and Standard American English (SAE). The benchmark reveals a clear, systematic bias: on average, the model scores AAE text as 1.8 times more toxic and 8.8 times higher for "identity hate". Second, I introduce an interactive pedagogical tool that makes these abstract biases tangible. The tool's core mechanic, a user-controlled "sensitivity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.