RELIC: Evaluating Complex Reasoning via the Recognition of Languages In-Context

Jackson Petty; Michael Y. Hu; Wentao Wang; Shauli Ravfogel; William Merrill; Tal Linzen

arXiv:2506.05205·cs.CL·April 29, 2026

RELIC: Evaluating Complex Reasoning via the Recognition of Languages In-Context

Jackson Petty, Michael Y. Hu, Wentao Wang, Shauli Ravfogel, William Merrill, Tal Linzen

PDF

TL;DR

RELIC is a framework that assesses large language models' complex reasoning by testing their ability to recognize context-free languages, revealing their struggles with scaling and strategy adaptation as task complexity increases.

Contribution

The paper introduces RELIC, a novel evaluation method for LLMs' reasoning skills based on context-free language recognition, enabling scalable complexity modulation.

Findings

01

Advanced models perform poorly on RELIC, failing to scale compute with task difficulty.

02

Models tend to reduce reasoning tokens and shift strategies as complexity increases.

03

Performance drops are linked to a shift from algorithmic solutions to guessing.

Abstract

Large language models (LLMs) are increasingly used to solve complex tasks where they must retrieve and compose many pieces of in-context information in long reasoning chains. For many real-world tasks it is hard to accurately gauge how model performance and strategy change as task complexity grows. To evaluate models' complex reasoning capability in a scalable and verifiable way, we introduce RELIC (Recognition of Languages In-Context), a framework that evaluates an LLM's ability to decide whether a given string belongs to the context-free language (CFL) generated by a grammar presented in-context. CFL recognition allows us to modulate the intrinsic complexity of the problem by varying grammar size and string length and translate this asymptotic complexity into predictions for ideal LLM performance. We find that even the most advanced reasoning models perform poorly on RELIC, not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.