Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)
Damien Sileo

TL;DR
This paper evaluates large language models' ability to perform long-context logical reasoning using complex, distractor-filled texts, revealing their effective context window is much smaller than claimed, especially with realistic distractors.
Contribution
It introduces a novel evaluation method using lengthy, logic-based texts with distractors to accurately measure LLMs' long-context reasoning capabilities.
Findings
Effective context window shrinks to 128 clauses with distractors
Models struggle to distinguish relevant evidence in long, homogeneous texts
Current evaluations may overestimate models' long-context reasoning abilities
Abstract
Large language models demonstrate promising long context processing capabilities, with recent models touting context windows close to one million tokens. However, the evaluations supporting these claims often involve simple retrieval tasks or synthetic tasks padded with irrelevant text, which the models may easily detect and discard. In this work, we generate lengthy simplified English text with first-order logic representations spanning up to 2048 clauses (around 25k GPT-4 tokens). We formulate an evaluation task with evidence retrieval for contradiction detection. The long, homogeneous text is filled with distractors that are both hard to distinguish from relevant evidences and provably not interfering with them. Our evaluation of evidence retrieval shows that the effective context window is much smaller with realistic distractors, already crumbling at 128 clauses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer
