CRISP: Complex Reasoning with Interpretable Step-based Plans
Matan Vetzler, Koren Lazar, Guy Uziel, Eran Hirsch, Ateret Anaby-Tavor, Leshem Choshen

TL;DR
CRISP introduces a multi-domain dataset of high-level plans for complex reasoning tasks, demonstrating that fine-tuning models on this data enhances plan quality and reasoning performance across various domains.
Contribution
The paper presents CRISP, a new dataset of automatically generated, validated high-level plans, and shows that fine-tuning models on CRISP improves reasoning and plan quality beyond few-shot prompting.
Findings
Fine-tuning on CRISP yields higher-quality plans than larger models with few-shot prompting.
Models fine-tuned on CRISP outperform Chain-of-Thought reasoning.
Cross-domain training on CRISP improves plan generation in different domains.
Abstract
Recent advancements in large language models (LLMs) underscore the need for stronger reasoning capabilities to solve complex problems effectively. While Chain-of-Thought (CoT) reasoning has been a step forward, it remains insufficient for many domains. A promising alternative is explicit high-level plan generation, but existing approaches largely assume that LLMs can produce effective plans through few-shot prompting alone, without additional training. In this work, we challenge this assumption and introduce CRISP (Complex Reasoning with Interpretable Step-based Plans), a multi-domain dataset of high-level plans for mathematical reasoning and code generation. The plans in CRISP are automatically generated and rigorously validated--both intrinsically, using an LLM as a judge, and extrinsically, by evaluating their impact on downstream task performance. We demonstrate that fine-tuning a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
