Spatial Reasoning and Planning for Deep Embodied Agents
Shu Ishida

TL;DR
This paper presents novel data-driven methods for spatial reasoning and planning in embodied agents, focusing on efficiency, interpretability, and transferability across diverse environments.
Contribution
It introduces four key techniques: CALVIN for interpretable world modeling, SOAP for unsupervised option discovery, LangProp for code-based reasoning with LLMs, and Voggite for complex task solving in Minecraft.
Findings
CALVIN successfully navigates 3D environments using learned models.
SOAP demonstrates robust performance on long-horizon and benchmark tasks.
LangProp generates interpretable code with high performance in autonomous driving.
Abstract
Humans can perform complex tasks with long-term objectives by planning, reasoning, and forecasting outcomes of actions. For embodied agents to achieve similar capabilities, they must gain knowledge of the environment transferable to novel scenarios with a limited budget of additional trial and error. Learning-based approaches, such as deep RL, can discover and take advantage of inherent regularities and characteristics of the application domain from data, and continuously improve their performances, however at a cost of large amounts of training data. This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks, focusing on enhancing learning efficiency, interpretability, and transferability across novel scenarios. Four key contributions are made. 1) CALVIN, a differential planner that learns interpretable models of the world for long-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator
