Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs

Mikhail Menschikov; Alexander Kharitonov; Maiia Kotyga; Vadim Porvatov; Anna Zhukovskaya; David Kagramanyan; Egor Shvetsov; Evgeny Burnaev

arXiv:2505.16134·cs.CL·December 15, 2025

Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs

Mikhail Menschikov, Alexander Kharitonov, Maiia Kotyga, Vadim Porvatov, Anna Zhukovskaya, David Kagramanyan, Egor Shvetsov, Evgeny Burnaev

PDF

TL;DR

This study reveals that multilingual LLMs exhibit model-specific position biases that vary across languages, challenging assumptions about early-token preference and highlighting the complex interaction between position, language, and prompting strategies.

Contribution

It provides a comprehensive analysis of position bias in multilingual LLMs across diverse languages and architectures, uncovering nuanced, model-specific, and language-specific effects.

Findings

01

Position bias is mainly model-driven with language-specific nuances.

02

Explicit prompts can reduce accuracy even with irrelevant distractors.

03

Accuracy drops most when relevant info is in the middle, without increased output entropy.

Abstract

Large Language Models (LLMs) exhibit position bias systematically underweighting information based on its location in the context but how this bias varies across languages and models remains unclear. We conduct a multilingual study across five typologically diverse languages (English, Russian, German, Hindi, Vietnamese) and five model architectures, analyzing how position bias interacts with prompting strategies and affects output entropy. Our key findings are: (1) Position bias is primarily model-driven but shows language-specific nuances. Notably, Qwen2.5-7B-Instruct, DeepSeek 7B Chat and Mistral 7B consistently favor late positions challenging the common assumption of universal early-token preference. (2) Explicitly instructing the model, in the presence of irrelevant distractors, that "the most relevant context to the query is marked as 1" unexpectedly reduces accuracy across all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.