Layer-wise Positional Bias in Short-Context Language Modeling
Maryam Rahimi, Mahdi Nouri, Yadollah Yaghoobzadeh

TL;DR
This paper introduces an attribution-based framework to analyze how positional biases evolve across layers in short-context language models, revealing stable, architecture-specific importance profiles and biases like recency and primacy.
Contribution
It presents a novel layer-wise analysis method for positional biases in language models, uncovering stable importance profiles and their relation to model depth and architecture.
Findings
Recency bias increases with model depth.
Primacy bias diminishes through model depth.
Early layers favor content words over function words.
Abstract
Language models often show a preference for using information from specific positions in the input regardless of semantic relevance. While positional bias has been studied in various contexts, from attention sinks to task performance degradation in long-context settings, prior work has not established how these biases evolve across individual layers and input positions, or how they vary independent of task complexity. We introduce an attribution-based framework to analyze positional effects in short-context language modeling. Using layer conductance with a sliding-window approach, we quantify how each layer distributes importance across input positions, yielding layer-wise positional importance profiles. We find that these profiles are architecture-specific, stable across inputs, and invariant to lexical scrambling. Characterizing these profiles, we find prominent recency bias that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Neurobiology of Language and Bilingualism · Multimodal Machine Learning Applications
