Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku

TL;DR
This paper investigates positional bias in language models, revealing that models struggle to extract information from later parts of documents due to auto-regressive training, and proposes regularization methods to improve extraction across positions.
Contribution
It uncovers the positional bias in LLMs' knowledge extraction and demonstrates how regularization can mitigate this issue, advancing understanding of model training effects.
Findings
Models answer first sentence questions accurately.
Extraction from middle/end of documents is challenging.
Regularization improves extraction from diverse positions.
Abstract
Large language models require updates to remain up-to-date or adapt to new domains by fine-tuning them with new documents. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. However, LLMs suffer from a phenomenon called perplexity curse; despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence. In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning. Our study suggests that the auto-regressive training causes this issue; each token is prompted by reliance on all previous tokens, which hinders the model from recalling information from training documents by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Seismology and Earthquake Studies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Byte Pair Encoding
