Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong

TL;DR
This paper introduces Predictive-Decoding, a novel approach that applies Model Predictive Control to Large Language Models, significantly improving their reasoning and planning accuracy while reducing computational costs.
Contribution
It proposes a new method, Predictive-Decoding, that enhances LLM planning by incorporating foresight and optimal-control principles, addressing their inherent myopic limitations.
Findings
Improved reasoning and planning accuracy across diverse tasks.
Enhanced computational efficiency over search baselines.
Effective mitigation of early errors in LLM planning.
Abstract
Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines…
Peer Reviews
Decision·ICLR 2025 Poster
1. The idea of making the LLM to think long term (non-myopic) makes sense for LLM planing and reasoning 2. The author seems to perform different experimentation on different test-beds.
1. the presentation of this paper could be improved (for instance fig 1 is too many figures/text, could you make it simpler)? 2. the introduction of "Myopic Gap" seems to be interesting, but the section could be rewritten for better readability. 3. lacking larger llm model size for math and gsm experimentation.
1. investigate an important problem, i.e. short-sightedness. 2. Good performance-budget balance in experiments.
1. case study? (focusing solely on historical information can lead to irreversible mistakes and potential planning failures.) 2. The formulation in lines 126-127 is not a POMDP, and lacks of mapping from global states to local observations. 3. I'm concerned that the author's hypotheses in lines 188-191 are untenable. It seems like saying that "the more confident the answer is for a given model, the more likely it is to be correct." and in this case the greedy answer should usually be selected?
- The application of Model Predictive Control to mitigate myopia in LLMs is a novel approach that enhances planning accuracy. - The paper provides a solid foundation and supports the claims with experimental results. - The paper is well-organized, with clear explanations of the problem, methodology, and results.
See questions
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Speech and dialogue systems · Artificial Intelligence in Games
