CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks
Tianlong Wang, Junzhe Chen, Xueting Han, Jing Bai

TL;DR
This paper introduces Critical Plan Step Learning (CPL), a novel reinforcement learning approach that improves large language models' reasoning and generalization across diverse tasks by focusing on critical plan steps through MCTS and advantage-based optimization.
Contribution
The paper proposes CPL, combining plan search with MCTS and step-level advantage optimization, to enhance LLM reasoning and generalization beyond task-specific training.
Findings
Significant performance improvements on GSM8K (+10.5%) and MATH (+6.5%)
Enhanced out-of-domain reasoning benchmarks like HumanEval (+12.2%)
Effective learning of critical plan steps improves reasoning capabilities.
Abstract
Post-training, particularly reinforcement learning (RL) using self-play-generated data, has become a new learning paradigm for large language models (LLMs). However, scaling RL to develop a general reasoner remains a research challenge, as existing methods focus on task-specific reasoning without adequately addressing generalization across a broader range of tasks. Moreover, unlike traditional RL with limited action space, LLMs operate in an infinite space, making it crucial to search for valuable and diverse strategies to solve problems effectively. To address this, we propose searching within the action space on high-level abstract plans to enhance model generalization and introduce Critical Plan Step Learning (CPL), comprising: 1) searching on plan, using Monte Carlo Tree Search (MCTS) to explore diverse plan steps in multi-step reasoning tasks, and 2) learning critical plan steps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Semantic Web and Ontologies · Machine Learning and Data Classification
MethodsFocus
