Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma; Haiteng Zhao; Junlei Zhang; Junxian He; Lingpeng Kong

arXiv:2410.17195·cs.AI·October 29, 2024

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Predictive-Decoding, a novel approach that applies Model Predictive Control to Large Language Models, significantly improving their reasoning and planning accuracy while reducing computational costs.

Contribution

It proposes a new method, Predictive-Decoding, that enhances LLM planning by incorporating foresight and optimal-control principles, addressing their inherent myopic limitations.

Findings

01

Improved reasoning and planning accuracy across diverse tasks.

02

Enhanced computational efficiency over search baselines.

03

Effective mitigation of early errors in LLM planning.

Abstract

Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 2

Strengths

1. The idea of making the LLM to think long term (non-myopic) makes sense for LLM planing and reasoning 2. The author seems to perform different experimentation on different test-beds.

Weaknesses

1. the presentation of this paper could be improved (for instance fig 1 is too many figures/text, could you make it simpler)? 2. the introduction of "Myopic Gap" seems to be interesting, but the section could be rewritten for better readability. 3. lacking larger llm model size for math and gsm experimentation.

Reviewer 02Rating 6Confidence 3

Strengths

1. investigate an important problem, i.e. short-sightedness. 2. Good performance-budget balance in experiments.

Weaknesses

1. case study? (focusing solely on historical information can lead to irreversible mistakes and potential planning failures.) 2. The formulation in lines 126-127 is not a POMDP, and lacks of mapping from global states to local observations. 3. I'm concerned that the author's hypotheses in lines 188-191 are untenable. It seems like saying that "the more confident the answer is for a given model, the more likely it is to be correct." and in this case the greedy answer should usually be selected?

Reviewer 03Rating 8Confidence 2

Strengths

- The application of Model Predictive Control to mitigate myopia in LLMs is a novel approach that enhances planning accuracy. - The paper provides a solid foundation and supports the claims with experimental results. - The paper is well-organized, with clear explanations of the problem, methodology, and results.

Weaknesses

See questions

Code & Models

Repositories

chang-github-00/llm-predictive-decoding
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Speech and dialogue systems · Artificial Intelligence in Games