A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms

Emma Harvey; Rene F. Kizilcec; Allison Koenecke

arXiv:2506.04419·cs.CY·June 6, 2025

A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms

Emma Harvey, Rene F. Kizilcec, Allison Koenecke

PDF

Open Access

TL;DR

This paper introduces a practical framework for auditing LLM-based chatbots for dialect bias by measuring quality-of-service harms, enabling real-world, multi-turn, dialect-inclusive assessments accessible to various auditors.

Contribution

It presents a novel, adaptable framework for auditing chatbots for dialect bias that emphasizes real-world interaction and requires only query access.

Findings

01

Rufus produces lower-quality responses to minoritized dialect prompts.

02

Typos in prompts worsen quality-of-service harms.

03

The framework effectively identifies dialect bias in a real-world chatbot.

Abstract

Increasingly, individuals who engage in online activities are expected to interact with large language model (LLM)-based chatbots. Prior work has shown that LLMs can display dialect bias, which occurs when they produce harmful responses when prompted with text written in minoritized dialects. However, whether and how this bias propagates to systems built on top of LLMs, such as chatbots, is still unclear. We conduct a review of existing approaches for auditing LLMs for dialect bias and show that they cannot be straightforwardly adapted to audit LLM-based chatbots due to issues of substantive and ecological validity. To address this, we present a framework for auditing LLM-based chatbots for dialect bias by measuring the extent to which they produce quality-of-service harms, which occur when systems do not work equally well for different people. Our framework has three key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Spam and Phishing Detection · Natural Language Processing Techniques