One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration

Jinbang Huang; Yixin Xiao; Zhanguang Zhang; Mark Coates; Jianye Hao; Yingxue Zhang

arXiv:2505.18382·cs.RO·February 13, 2026

One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration

Jinbang Huang, Yixin Xiao, Zhanguang Zhang, Mark Coates, Jianye Hao, Yingxue Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces PDDLLM, a framework that automatically derives planning domains from a single demonstration using LLM reasoning and simulation, significantly improving robotic long-horizon task planning.

Contribution

PDDLLM automatically induces symbolic planning domains from demonstrations without manual input, enhancing automation and success rates in robotic planning tasks.

Findings

01

Achieved at least 20% higher success rates than baselines.

02

Reduced token costs in planning.

03

Successfully deployed on multiple physical robots.

Abstract

Pre-trained large language models (LLMs) show promise for robotic task planning but often struggle to guarantee correctness in long-horizon problems. Task and motion planning (TAMP) addresses this by grounding symbolic plans in low-level execution, yet it relies heavily on manually engineered planning domains. To improve long-horizon planning reliability and reduce human intervention, we present Planning Domain Derivation with LLMs (PDDLLM), a framework that automatically induces symbolic predicates and actions directly from demonstration trajectories by combining LLM reasoning with physical simulation roll-outs. Unlike prior domain-inference methods that rely on partially predefined or language descriptions of planning domains, PDDLLM constructs domains without manual domain initialization and automatically integrates them with motion planners to produce executable plans, enhancing…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

This work targets an interesting and meaningful problem in planning. The proposed methodology does not rely on pre-defined predicate space and action model, which reduce the effort of human annotation. Experimental in real robot environments demonstrates the effectiveness of the proposed method.

Weaknesses

**Major** Successful deployment of the proposed method requires a perception function that can accurately extract the continuous states from objects. It is unclear what types of the perception function the proposed method can work along well. It was not extensively discussed in the paper how robust the proposed method is with respect to any noises in the perception process. It seems that applying the proposed method in real applications requires setting up a same digital copy in a simulation.

Reviewer 02Rating 6Confidence 4

Strengths

1. This paper aims to automatically generate planning domains from one demonstration to reduce manual engineering efforts, which is a valuable goal for the field. 2. Experiments on 9 tasks show that the proposed method achieves a high success rate, outperforming other baselines.

Weaknesses

1. The paper relies on a physics simulator to evaluate the physical feasibility of predicates. However, such simulation-based evaluation may fail to capture complex dynamics, limiting the method’s generalization to real-world settings. The current experiments only involve simple rigid-body interactions, so it remains unclear how the proposed approach would perform with more complex objects such as deformable materials or fluids. 2. There are several unclear aspects in the paper: (1) It is not cl

Reviewer 03Rating 6Confidence 2

Strengths

The submission tackles the costly manual domain-spec bottleneck in TAMP and positions the work among LLM planners and domain-inference lines of work. The end-to-end automation pipeline (predicate imagination, action invention and LoCA) is, in my understanding, the main contribution/novelty. The tasks are varied in difficulty and nature (Tower of Hanoi, bridge building, burger cooking), and multiple SOTA baselines are included (LLMTAMP, LLMTAMP-FF/FR, o1-TAMP, R1-TAMP, RuleAsMem). Analysis of ti

Weaknesses

Since the pipeline still relies on some hand-chosen design choices (e.g. $u$ for subspace granularity), statements about constructing domains "without manual predesign" might be exaggerated. How often does the limited operator set miss required invariants? First-order predicates come from discretized subspaces while higher-order ones use a limited set of logical operators/quantifiers. The limitations section admits missing complex predicates (e.g., ordering constraints), which can materially af

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques