Where is the answer? Investigating Positional Bias in Language Model   Knowledge Extraction

Kuniaki Saito; Kihyuk Sohn; Chen-Yu Lee; Yoshitaka Ushiku

arXiv:2402.12170·cs.CL·April 21, 2025·1 cites

Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction

Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates positional bias in language models, revealing that models struggle to extract information from later parts of documents due to auto-regressive training, and proposes regularization methods to improve extraction across positions.

Contribution

It uncovers the positional bias in LLMs' knowledge extraction and demonstrates how regularization can mitigate this issue, advancing understanding of model training effects.

Findings

01

Models answer first sentence questions accurately.

02

Extraction from middle/end of documents is challenging.

03

Regularization improves extraction from diverse positions.

Abstract

Large language models require updates to remain up-to-date or adapt to new domains by fine-tuning them with new documents. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. However, LLMs suffer from a phenomenon called perplexity curse; despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence. In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning. Our study suggests that the auto-regressive training causes this issue; each token is prompted by reliance on all previous tokens, which hinders the model from recalling information from training documents by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

omron-sinicx/whereistheanswer
pytorchOfficial

Datasets

omron-sinicx/wiki2023_plus
dataset· 69 dl
69 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Seismology and Earthquake Studies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Byte Pair Encoding