Embodied Task Planning via Graph-Informed Action Generation with Large Language Models
Xiang Li, Ning Yan, Masood Mortazavi

TL;DR
This paper introduces GiG, a graph-informed planning framework for embodied agents using large language models, improving long-horizon task decomposition and environment constraint adherence.
Contribution
GiG structures agent memory with a Graph-in-Graph architecture, integrating GNNs and symbolic logic for better planning and retrieval of relevant past experiences.
Findings
Achieves up to 37% performance improvement on benchmarks.
Outperforms state-of-the-art baselines in embodied planning tasks.
Maintains lower or comparable computational costs.
Abstract
While Large Language Models (LLMs) have demonstrated strong zero-shot reasoning capabilities, their deployment as embodied agents still faces fundamental challenges in long-horizon planning. Unlike open-ended text generation, embodied agents must decompose high-level intents into actionable sub-goals while adhering to the constraints of a dynamic environment. Standard LLM planners frequently fail to maintain strategy coherence over extended horizons due to context window limitations or hallucinate state transitions that violate environment constraints. We propose GiG, a planning framework that structures embodied agents' memory using a Graph-in-Graph architecture. Our approach employs a Graph Neural Network (GNN) to encode environmental states into embeddings, organizing these embeddings into action-connected execution trace graphs within an experience memory bank. GiG enables retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Artificial Intelligence in Games
