The CLRS-Text Algorithmic Reasoning Language Benchmark

Larisa Markeeva; Sean McLeish; Borja Ibarz; Wilfried Bounsi; Olga; Kozlova; Alex Vitvitskyi; Charles Blundell; Tom Goldstein; Avi Schwarzschild,; Petar Veli\v{c}kovi\'c

arXiv:2406.04229·cs.LG·June 7, 2024

The CLRS-Text Algorithmic Reasoning Language Benchmark

Larisa Markeeva, Sean McLeish, Borja Ibarz, Wilfried Bounsi, Olga, Kozlova, Alex Vitvitskyi, Charles Blundell, Tom Goldstein, Avi Schwarzschild,, Petar Veli\v{c}kovi\'c

PDF

Open Access 2 Repos 2 Datasets

TL;DR

CLRS-Text is a new textual benchmark inspired by classical algorithms, designed to evaluate and improve language models' reasoning capabilities across diverse algorithmic tasks with a standardized, extensible framework.

Contribution

It introduces a textual version of the CLRS benchmark, enabling procedural generation of algorithmic reasoning tasks for language models, and provides a platform for evaluating and advancing LM reasoning.

Findings

01

Fine-tuned LMs perform variably on CLRS-Text tasks.

02

The benchmark reveals new challenges in LM reasoning.

03

The framework allows easy addition of new algorithmic tasks.

Abstract

Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to reasoning focus on out-of-distribution performance on procedurally-generated synthetic benchmarks, bespoke-built to evaluate specific skills only. This trend makes results hard to transfer across publications, slowing down progress. Three years ago, a similar issue was identified and rectified in the field of neural algorithmic reasoning, with the advent of the CLRS benchmark. CLRS is a dataset generator comprising graph execution traces of classical algorithms from the Introduction to Algorithms textbook. Inspired by this, we propose CLRS-Text -- a textual version of these algorithmic traces. Out of the box, CLRS-Text is capable of procedurally generating trace data for thirty diverse, challenging algorithmic tasks across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus