Knowledge Model Prompting Increases LLM Performance on Planning Tasks

Erik Goh; John Kos; Ashok Goel

arXiv:2602.03900·cs.AI·February 5, 2026

Knowledge Model Prompting Increases LLM Performance on Planning Tasks

Erik Goh, John Kos, Ashok Goel

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that using the Task-Method-Knowledge (TMK) framework as a prompting strategy significantly enhances large language models' reasoning and planning abilities, especially in complex symbolic tasks, surpassing previous methods.

Contribution

It introduces TMK prompting for LLMs, leveraging explicit task decomposition and hierarchical reasoning to improve performance on planning benchmarks like PlanBench.

Findings

01

Achieved up to 97.3% accuracy on Blocksworld tasks with TMK prompting.

02

Significant performance improvement over traditional prompting methods.

03

TMK prompts help models engage formal reasoning pathways instead of linguistic modes.

Abstract

Large Language Models (LLM) can struggle with reasoning ability and planning tasks. Many prompting techniques have been developed to assist with LLM reasoning, notably Chain-of-Thought (CoT); however, these techniques, too, have come under scrutiny as LLMs' ability to reason at all has come into question. Borrowing from the domain of cognitive and educational science, this paper investigates whether the Task-Method-Knowledge (TMK) framework can improve LLM reasoning capabilities beyond its previously demonstrated success in educational applications. The TMK framework's unique ability to capture causal, teleological, and hierarchical reasoning structures, combined with its explicit task decomposition mechanisms, makes it particularly well-suited for addressing language model reasoning deficiencies, and unlike other hierarchical frameworks such as HTN and BDI, TMK provides explicit…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 0Confidence 3

Strengths

The method focuses on the key problem of long reasoning LLMs, which does not have clear task decomposition during thinking.

Weaknesses

+ Many fields are missing in Table 2, especially Plain Text + One Shot. Note that it's unfair to compare the other two columns (TMK + One Shot vs Plain Text + Zero Shot). The paper does not have other main results. + It does not make sense to replace standard description of blockworlds into irrelevant mystery or random words. No LLM learns it during pre-training, nor people will use those words to describe tasks, nor they will use LLMs like this. + It's not easy to understand what TMK is doing (

Reviewer 02Rating 0Confidence 5

Strengths

The paper shows the potential of applying a prompting technique to improve the performance of models on the blocksworld task.

Weaknesses

1. The reviewer finds it difficult to understand what is the core idea/contribution of TMK prompting method proposed in the paper. The paper did poorly in explaining the proposed prompting method. 2. The paper proposes a prompting method, yet there are no examples of the prompt in the paper. 3. Experimental evidence of the advantage of the proposed method is very limited: only OpenAI models, only one task (blocksworld), and many numbers are missing (Table 2)

Reviewer 03Rating 2Confidence 3

Strengths

• The paper introduces a simple and interpretable idea: representing planning tasks in a structured TMK format may align with how procedural knowledge is expressed in model pre-training data • The approach is prompt-based and requires no fine-tuning or external resources, making it easy to reproduce and extend. • The empirical trend (larger gains for weaker models) is intuitive and suggests the TMK structure provides helpful inductive bias.

Weaknesses

• The experiments are narrow: only one domain and one benchmark family. Claims about general planning improvement are therefore not well supported. • There are no comparisons with so many other structured prompting methods (eg CoS, ReAct, least-to-most, chain-of-thought scaffolding and so on). It is unclear whether TMK offers advantages beyond simply using more structured json templates. • The explanation of why TMK helps remains speculative. No ablation isolates whether improvements come from

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)