GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context   Length and Reasoning Complexity?

Yang Zhou; Hongyi Liu; Zhuoming Chen; Yuandong Tian; Beidi Chen

arXiv:2502.05252·cs.CL·February 11, 2025

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?

Yang Zhou, Hongyi Liu, Zhuoming Chen, Yuandong Tian, Beidi Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces GSM-Infinite, a new benchmark to evaluate how large language models handle infinitely increasing context length and reasoning complexity, revealing fundamental limitations and scaling trends.

Contribution

The paper presents GSM-Infinite, a scalable, controllable benchmark for systematically assessing LLM reasoning in long, complex contexts, and analyzes performance trends with increasing difficulty.

Findings

01

Performance declines sigmoidally with increased complexity.

02

Exponential inference scaling yields linear performance improvements.

03

Current LLMs face fundamental limitations in reasoning over long, complex contexts.

Abstract

Long-context large language models (LLMs) have recently shown strong performance in information retrieval and long-document QA. However, to tackle the most challenging intellectual problems, LLMs must reason effectively in long and complex contexts (e.g., frontier mathematical research). Studying how LLMs handle increasing reasoning complexity and context length is essential, yet existing benchmarks lack a solid basis for quantitative evaluation. Inspired by the abstraction of GSM-8K problems as computational graphs, and the ability to introduce noise by adding unnecessary nodes and edges, we develop a grade school math problem generator capable of producing arithmetic problems with infinite difficulty and context length under fine-grained control. Using our newly synthesized GSM-Infinite benchmark, we comprehensively evaluate existing LLMs. We find a consistent sigmoid decline in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Infini-AI-Lab/gsm_infinite
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Multi-Agent Systems and Negotiation