Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Taiming Lu, Muhan Gao, Kuai Yu, Adam Byerly, Daniel Khashabi

TL;DR
This paper investigates why large language models struggle with long contexts, revealing they encode positional information but often fail to use it effectively during response generation, highlighting a disconnect in their reasoning process.
Contribution
The study uncovers the 'know but don't tell' phenomenon in LLMs, providing new insights into their internal representations and reasoning failures with long contexts.
Findings
LLMs encode positional information of target data
LLMs often fail to leverage positional info in responses
Analysis of extraction time correlates with accuracy
Abstract
Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze the relationship between extraction time and final accuracy, offering insights into the underlying mechanics of transformer models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvancements in Photolithography Techniques · Electricity Theft Detection Techniques
