Analysis of LLM as a grammatical feature tagger for African American English
Rahul Porwal, Alice Rozet, Pryce Houck, Jotsna Gowda, Sarah Moeller, Kevin Tang

TL;DR
This study evaluates how well different NLP models, including large language models, identify key grammatical features of African American English, revealing their strengths and biases, and emphasizing the need for better training methods.
Contribution
It systematically compares rule-based, transformer-based, and large language models in identifying AAE grammatical features, highlighting current limitations and biases.
Findings
LLMs outperform baseline models but are affected by biases.
Biases include recency effects and influence of unrelated text features.
Improved training and architecture are needed for better AAE processing.
Abstract
African American English (AAE) presents unique challenges in natural language processing (NLP). This research systematically compares the performance of available NLP models--rule-based, transformer-based, and large language models (LLMs)--capable of identifying key grammatical features of AAE, namely Habitual Be and Multiple Negation. These features were selected for their distinct grammatical complexity and frequency of occurrence. The evaluation involved sentence-level binary classification tasks, using both zero-shot and few-shot strategies. The analysis reveals that while LLMs show promise compared to the baseline, they are influenced by biases such as recency and unrelated features in the text such as formality. This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics. Data and code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLexicography and Language Studies · Linguistics, Language Diversity, and Identity
