Improving Large Language Model Planning with Action Sequence Similarity
Xinran Zhao, Hanie Sedghi, Bernd Bohnet, Dale Schuurmans, Azade Nova

TL;DR
This paper introduces GRASE-DC, a novel exemplar selection method based on action sequence similarity that significantly enhances large language model planning performance across various tasks.
Contribution
It proposes a new exemplar sampling and filtering approach leveraging plan action sequence similarity, improving LLM planning accuracy and generalization.
Findings
GRASE-DC improves planning accuracy by up to 40 points.
It reduces the number of exemplars needed by 27.3%.
Performance boosts are consistent across different LLMs and benchmarks.
Abstract
Planning is essential for artificial intelligence systems to look ahead and proactively determine a course of actions to reach objectives in the virtual and real world. Recent work on large language models (LLMs) sheds light on their planning capability in various tasks. However, it remains unclear what signals in the context influence the model performance. In this work, we explore how to improve the model planning capability through in-context learning (ICL), specifically, what signals can help select the exemplars. Through extensive experiments, we observe that commonly used problem similarity may result in false positives with drastically different plans, which can mislead the model. In response, we propose to sample and filter exemplars leveraging plan side action sequence similarity (AS). We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then…
Peer Reviews
Decision·ICLR 2025 Poster
1. The proposed approach is both intuitive and shows good empirical performance as it selects exemplars based on AS similarity, providing the similar types of exemplars as the test task. 2. The empirical evaluation is extensive, spanning four PDDL tasks and a natural language planning task, and it tests the method across different base models, showcasing the robustness of the approach.
1. The effectiveness of GRASE is highly dependent on the quality of initial plans generated by the LLM with randomly selected exemplars; poor initial plans can lead to compromised AS-based exemplar selection. 2. For setups with validator access, a baseline comparison with rejection sampling could improve the analysis. E.g., under a similar validator query budget, the validator can be used to reject the invalid plans and select the better plan generated by the approach with random exemplars.
- **Originality**: This paper has good originality in that it proposes to focus on action sequence similarity instead of the traditional criteria based on the semantic similarity between task descriptions when performing example selection for LLM in-context learning of planning tasks. - **Quality**: This paper has high overall quality. Most of the steps in the proposed GRASE-DC pipeline are very clearly described and discussed in the methodology section, and Section 3 as well as the appendix als
1. In the formula on Line 141, shouldn’t there be a ‘| |’ symbol around ‘LCAS(A_i, A_j)’? According to the previous description, LCAS(A_i, A_j) is a sequence, not a number. 2. The notation and definition of the core concept ‘Action Sequence Similarity’ should be defined more clearly and strictly in the paper. Currently some mentions of AS are a little vague and confusing.
- This paper breaks away from traditional task similarity and instead utilizes action sequence similarity to select exemplars for ICL, enhancing the model's planning performance, which is simple yet effective. - The GRASE-DC shows great generalization performance on more complex tasks. - Endeavors have been made to pursue the efficiency of the exemplar selection process.
1. The method of using clustering algorithms to improve the relevance and diversity of the selected exemplars has been proposed by other paper [1] before. 2. The selected evaluation datasets lack some real world simulated tasks such as ALFWorld, Mind2Web, ScienceWorld, etc. 3. Though the VAL mechanism is referenced from other paper, the authors are suggested to introduce it briefly in the paper to enhance the readability as VAL appears frequently in the paper. [1] Automatic Chain of Thought Pro
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
