Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without   Easily Identifiable Unrelated Padding)

Damien Sileo

arXiv:2502.17169·cs.CL·February 25, 2025

Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)

Damien Sileo

PDF

Open Access 1 Video

TL;DR

This paper evaluates large language models' ability to perform long-context logical reasoning using complex, distractor-filled texts, revealing their effective context window is much smaller than claimed, especially with realistic distractors.

Contribution

It introduces a novel evaluation method using lengthy, logic-based texts with distractors to accurately measure LLMs' long-context reasoning capabilities.

Findings

01

Effective context window shrinks to 128 clauses with distractors

02

Models struggle to distinguish relevant evidence in long, homogeneous texts

03

Current evaluations may overestimate models' long-context reasoning abilities

Abstract

Large language models demonstrate promising long context processing capabilities, with recent models touting context windows close to one million tokens. However, the evaluations supporting these claims often involve simple retrieval tasks or synthetic tasks padded with irrelevant text, which the models may easily detect and discard. In this work, we generate lengthy simplified English text with first-order logic representations spanning up to 2048 clauses (around 25k GPT-4 tokens). We formulate an evaluation task with evidence retrieval for contradiction detection. The long, homogeneous text is filled with distractors that are both hard to distinguish from relevant evidences and provably not interfering with them. Our evaluation of evidence retrieval shows that the effective context window is much smaller with realistic distractors, already crumbling at 128 clauses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Logic Haystacks: Probing LLMs' Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer