SLR: Automated Synthesis for Scalable Logical Reasoning
Lukas Helff, Ahmad Omar, Felix Friedrich, Antonia W\"ust, Hikaru Shindo, Rupert Mitchell, Tim Woydt, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting

TL;DR
SLR is an automated framework for evaluating and training large language models in logical reasoning, creating a comprehensive benchmark and demonstrating improved performance with curriculum learning.
Contribution
Introduces SLR, an automated, scalable method for synthesizing reasoning tasks and a large benchmark, enabling effective training and evaluation of LLMs in logical reasoning.
Findings
Contemporary LLMs produce valid rules but often fail in logical inference.
Recent reasoning LLMs improve accuracy but have high computational costs.
Curriculum learning with SLR doubles Llama-3-8B accuracy on SLR-Bench.
Abstract
We introduce SLR, an end-to-end framework for systematic evaluation and training of Large Language Models (LLMs) via Scalable Logical Reasoning. Given a user's task specification, SLR automatically synthesizes (i) an instruction prompt for an inductive reasoning task, (ii) a validation program, executable on model outputs to provide verifiable rewards, and (iii) the latent ground-truth rule. This process is fully automated, scalable, requires no human annotations, and offers precise control over task difficulty. Using SLR, we create SLR-Bench, a benchmark comprising 19k prompts organized into 20 curriculum levels that progressively increase in relational, arithmetic, and recursive complexity. Large-scale evaluation reveals that contemporary LLMs readily produce syntactically valid rules, yet often fail at correct logical inference. Recent reasoning LLMs demonstrate improved performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- AIML-TUDA/SLR-Benchdataset· 1.4k dl1.4k dl
- AIML-TUDA/SLR-Bench-Germandataset· 2.0k dl2.0k dl
- ahmad21omar/SLR-Bench-Spanishdataset· 8 dl8 dl
- AIML-TUDA/SLR-Bench-Spanishdataset· 887 dl887 dl
- ahmad21omar/SLR-Bench-Frenchdataset· 9 dl9 dl
- AIML-TUDA/SLR-Bench-Frenchdataset· 643 dl643 dl
- AIML-TUDA/SLR-Bench-Portuguesedataset· 884 dl884 dl
- AIML-TUDA/SLR-Bench-Italiandataset· 184 dl184 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
