AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Abhay Gupta; Philip Meng; Ece Yurtseven; Sean O'Brien; Kevin Zhu

arXiv:2408.14845·cs.CL·October 17, 2025

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Abhay Gupta, Philip Meng, Ece Yurtseven, Sean O'Brien, Kevin Zhu

PDF

Open Access 1 Video

TL;DR

This paper introduces AAVENUE, a benchmark for evaluating large language models' performance on NLU tasks in AAVE versus SAE, revealing biases and emphasizing the need for more inclusive NLP models.

Contribution

The paper presents AAVENUE, a novel benchmark using LLM-based translation for assessing biases in LLMs on AAVE and SAE NLU tasks, extending existing benchmarks with a flexible methodology.

Findings

01

LLMs perform better on SAE than AAVE tasks, indicating biases.

02

AAVENUE improves evaluation metrics over previous benchmarks.

03

Authentic AAVE translations validated by fluent speakers.

Abstract

Detecting biases in natural language understanding (NLU) for African American Vernacular English (AAVE) is crucial to developing inclusive natural language processing (NLP) systems. To address dialect-induced performance discrepancies, we introduce AAVENUE ({AAVE} {N}atural Language {U}nderstanding {E}valuation), a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English (SAE). AAVENUE builds upon and extends existing benchmarks like VALUE, replacing deterministic syntactic and morphological transformations with a more flexible methodology leveraging LLM-based translation with few-shot prompting, improving performance across our evaluation metrics when translating key tasks from the GLUE and SuperGLUE benchmarks. We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Sparse Evolutionary Training