RELIC: Evaluating Complex Reasoning via the Recognition of Languages In-Context
Jackson Petty, Michael Y. Hu, Wentao Wang, Shauli Ravfogel, William Merrill, Tal Linzen

TL;DR
RELIC is a framework that assesses large language models' complex reasoning by testing their ability to recognize context-free languages, revealing their struggles with scaling and strategy adaptation as task complexity increases.
Contribution
The paper introduces RELIC, a novel evaluation method for LLMs' reasoning skills based on context-free language recognition, enabling scalable complexity modulation.
Findings
Advanced models perform poorly on RELIC, failing to scale compute with task difficulty.
Models tend to reduce reasoning tokens and shift strategies as complexity increases.
Performance drops are linked to a shift from algorithmic solutions to guessing.
Abstract
Large language models (LLMs) are increasingly used to solve complex tasks where they must retrieve and compose many pieces of in-context information in long reasoning chains. For many real-world tasks it is hard to accurately gauge how model performance and strategy change as task complexity grows. To evaluate models' complex reasoning capability in a scalable and verifiable way, we introduce RELIC (Recognition of Languages In-Context), a framework that evaluates an LLM's ability to decide whether a given string belongs to the context-free language (CFL) generated by a grammar presented in-context. CFL recognition allows us to modulate the intrinsic complexity of the problem by varying grammar size and string length and translate this asymptotic complexity into predictions for ideal LLM performance. We find that even the most advanced reasoning models perform poorly on RELIC, not only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
