Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville

TL;DR
This paper introduces a new framework to explain why large language models struggle with complex reasoning tasks, highlighting the importance of internal communication bandwidth and proposing solutions to improve their reasoning capabilities.
Contribution
The paper formalizes the bandwidth constraints in LLMs using the BAPO model and demonstrates how these limits cause reasoning failures, also showing how chain of thought can mitigate these issues.
Findings
LLMs succeed on BAPO-easy tasks but fail on BAPO-hard tasks.
Chain of thought can convert BAPO-hard problems into BAPO-easy ones.
The BAPO model explains key LLM reasoning failures.
Abstract
Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4o, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Graph Neural Networks
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Absolute Position Encodings · Residual Connection
