ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation

Zhuojie Yang; Wentao Wan; Keze Wang

arXiv:2603.21140·cs.AI·March 24, 2026·AAAI

ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation

Zhuojie Yang, Wentao Wan, Keze Wang

PDF

Open Access

TL;DR

ORACLE is a novel framework that combines large language models with symbolic reasoning to generate high-quality, step-wise reasoning data, significantly improving reasoning capabilities across various benchmarks.

Contribution

It introduces a structured data generation method that integrates LLMs with symbolic verification, enabling reliable multi-step reasoning data creation for natural language tasks.

Findings

01

Outperforms strong baselines on six reasoning benchmarks

02

Enhances intermediate step verification in synthetic reasoning data

03

Improves reasoning accuracy across logical, factual, and commonsense tasks

Abstract

Training large language models (LLMs) with synthetic reasoning data has become a popular approach to enhancing their reasoning capabilities, while a key factor influencing the effectiveness of this paradigm is the quality of the generated multi-step reasoning data. To generate high-quality reasoning data, many recent methods generate synthetic reasoning paths and filter them based on final answer correctness, often overlooking flaws in intermediate reasoning steps. To enhance the verification of intermediate reasoning steps, prior work primarily resorts to code execution or symbolic reasoning engines. However, code-based validation is restricted to code or mathematical tasks, and reasoning engines require a well-structured and complete context. As a result, existing methods fail to function effectively in natural language reasoning tasks that involve ambiguous or incomplete contexts. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Constraint Satisfaction and Optimization