Linguistic Bias in ChatGPT: Language Models Reinforce Dialect   Discrimination

Eve Fleisig; Genevieve Smith; Madeline Bossi; Ishita Rustagi; Xavier; Yin; Dan Klein

arXiv:2406.08818·cs.CL·September 18, 2024·2 cites

Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

Eve Fleisig, Genevieve Smith, Madeline Bossi, Ishita Rustagi, Xavier, Yin, Dan Klein

PDF

Open Access

TL;DR

This study reveals that ChatGPT models tend to favor standard English dialects and often produce biased, stereotypical, and less accurate responses when interacting with non-standard dialects, highlighting linguistic discrimination issues.

Contribution

The paper provides a large-scale analysis of dialect bias in ChatGPT, demonstrating how models reinforce dialect discrimination and comparing performance between GPT-3.5 and GPT-4.

Findings

01

Models default to standard English dialects.

02

Non-standard dialect responses show increased stereotyping and demeaning content.

03

GPT-4 exhibits more stereotyping despite improved comprehension.

Abstract

We present a large-scale study of linguistic bias exhibited by ChatGPT covering ten dialects of English (Standard American English, Standard British English, and eight widely spoken non-"standard" varieties from around the world). We prompted GPT-3.5 Turbo and GPT-4 with text by native speakers of each variety and analyzed the responses via detailed linguistic feature annotation and native speaker evaluation. We find that the models default to "standard" varieties of English; based on evaluation by native speakers, we also find that model responses to non-"standard" varieties consistently exhibit a range of issues: stereotyping (19% worse than for "standard" varieties), demeaning content (25% worse), lack of comprehension (9% worse), and condescending responses (15% worse). We also find that if these models are asked to imitate the writing style of prompts in non-"standard" varieties,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · 7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Cosine Annealing · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing