Learning Planning Abstractions from Language
Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu

TL;DR
This paper introduces PARL, a framework that learns symbolic state and action abstractions from language-annotated demonstrations to improve planning in decision-making tasks involving novel objects and environments.
Contribution
PARL is a novel framework that automatically discovers symbolic abstractions from language, enabling generalization across unseen scenarios and longer planning horizons.
Findings
PARL successfully generalizes to new objects and environments.
It improves planning efficiency with learned abstractions.
PARL outperforms baseline methods in complex tasks.
Abstract
This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer…
Peer Reviews
Decision·ICLR 2024 poster
* Originality: The novel aspect of the presented method is combining symbolic planning with action predicates extracted from natural language goal descriptions or instructions and latent space representation learning with point cloud transformers. The abstract planning is done at the symbolic level, and the abstract state transitions are tracked with object-centric representation learned from segmented pixel data. * Quality: The overall description of the method is easy to understand and the exp
* Originality: Individual components are existing approaches and the originality is on bring those components and implement an agent to solve mini-grid and kitchen world problems. * Quality: Due to missing details, it is difficult to assess the quality. * Clarity: There are many missing details in the paper. * Significance: The comparison is made only against a simpler baselines (end to end RL and behavior cloning).
Originality: the paper proposes a novel way to solve sequential decision making problems by combining LLM prompting, imitation learning and traditional reinforcement learning. Quality: the paper places the work in the literature very well, comparing the differences between previous works and mentions future work. The problem formulation is mostly clearly written up. Clarity: The paper is mostly well-written and apart from a few inconsistencies, easy to understand. Significance: its originalit
I have three main reservations: - There is no available code, no experiment details (chosen hyperparameters, tuning) about the algorithm or the baselines and as such the results are not reproducible - The experiments themselves, the results, and the metrics are described in a very high level without details which does not allow the reader to indeed verify how well they support the claims. - Scalability: as the number of actions, objects and their combinations increase, the necessary training
**Originalty:** The paper investigates a problem is not tackled in the literature but can realistically exist. The paper is a novel and creative framework for addressing this problem. **Clarity:** The paper is well-writtten. **Significane:** This work has the potential to be impactful in language-based agent interactions. Furthermore, the framework can be adapted to other sequential planning domains. **Quality:** The problem described is well-motivated. The approach to addressing the proble
I don't have any major gripes. However, I found the description of the experimental domains lacking. Particularly I am not totally clear on the difference between the key-door and two-corridor environments.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
