The CLRS-Text Algorithmic Reasoning Language Benchmark
Larisa Markeeva, Sean McLeish, Borja Ibarz, Wilfried Bounsi, Olga, Kozlova, Alex Vitvitskyi, Charles Blundell, Tom Goldstein, Avi Schwarzschild,, Petar Veli\v{c}kovi\'c

TL;DR
CLRS-Text is a new textual benchmark inspired by classical algorithms, designed to evaluate and improve language models' reasoning capabilities across diverse algorithmic tasks with a standardized, extensible framework.
Contribution
It introduces a textual version of the CLRS benchmark, enabling procedural generation of algorithmic reasoning tasks for language models, and provides a platform for evaluating and advancing LM reasoning.
Findings
Fine-tuned LMs perform variably on CLRS-Text tasks.
The benchmark reveals new challenges in LM reasoning.
The framework allows easy addition of new algorithmic tasks.
Abstract
Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to reasoning focus on out-of-distribution performance on procedurally-generated synthetic benchmarks, bespoke-built to evaluate specific skills only. This trend makes results hard to transfer across publications, slowing down progress. Three years ago, a similar issue was identified and rectified in the field of neural algorithmic reasoning, with the advent of the CLRS benchmark. CLRS is a dataset generator comprising graph execution traces of classical algorithms from the Introduction to Algorithms textbook. Inspired by this, we propose CLRS-Text -- a textual version of these algorithmic traces. Out of the box, CLRS-Text is capable of procedurally generating trace data for thirty diverse, challenging algorithmic tasks across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
