From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

Mohsinul Kabir; Tasfia Tahsin; Sophia Ananiadou

arXiv:2505.12381·cs.CL·November 14, 2025

From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

Mohsinul Kabir, Tasfia Tahsin, Sophia Ananiadou

PDF

TL;DR

This paper investigates how different language model architectures, including transformers and n-grams, propagate bias, emphasizing the influence of data provenance, model design, and temporal factors on bias amplification.

Contribution

It introduces a comparative behavioral methodology to analyze bias origins, revealing architecture-specific sensitivities and the impact of data provenance on bias propagation.

Findings

01

Transformers show architectural robustness to context window size in bias propagation.

02

Temporal provenance of data significantly influences bias.

03

Certain biases are disproportionately amplified depending on model architecture.

Abstract

Current research on bias in language models (LMs) predominantly focuses on data quality, with significantly less attention paid to model architecture and temporal influences of data. Even more critically, few studies systematically investigate the origins of bias. We propose a methodology grounded in comparative behavioral theory to interpret the complex interaction between training data and model architecture in bias propagation during language modeling. Building on recent work that relates transformers to n-gram LMs, we evaluate how data, model design choices, and temporal dynamics affect bias propagation. Our findings reveal that: (1) n-gram LMs are highly sensitive to context window size in bias propagation, while transformers demonstrate architectural robustness; (2) the temporal provenance of training data significantly affects bias; and (3) different model architectures respond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need