Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training
Valentin Lacombe, Valentin Quesnel, Damien Sileo

TL;DR
Reasoning Core is a scalable suite for generating diverse, verifiable symbolic reasoning data across multiple formal domains, enhancing language models' reasoning abilities through curriculum learning and supervised training.
Contribution
It introduces a flexible, scalable procedural data generator for symbolic reasoning tasks with verification, difficulty control, and integration into language model training.
Findings
Improves downstream reasoning performance when mixed into pre-training.
Enables supervised training with reasoning traces from early stages.
Challenges models like GPT-5 in zero-shot evaluations.
Abstract
Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators often rely on fixed puzzles or templates and do not deliver the distributional breadth needed at scale. We introduce Reasoning Core, a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains: PDDL planning over randomized domains, first-order logic with equality, context-free grammar parsing and generation, causal reasoning over random Bayesian networks, and systems of equations. Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design. Examples can optionally include solver-derived reasoning traces, enabling supervised training from the earliest pre-training stages, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
