Robotic Control via Embodied Chain-of-Thought Reasoning
Micha{\l} Zawalski, William Chen, Karl Pertsch, Oier Mees and, Chelsea Finn, Sergey Levine

TL;DR
This paper introduces Embodied Chain-of-Thought Reasoning (ECoT), a method enabling vision-language-action models to perform multi-step, grounded reasoning about tasks and observations, significantly improving robot control robustness and interpretability.
Contribution
The paper presents ECoT, a novel training pipeline for VLAs that incorporates multi-step, grounded reasoning, enhancing generalization and interpretability without extra robot data.
Findings
ECoT increases success rate of VLA policies by 28% on challenging tasks.
ECoT improves interpretability and error correction of robot policies.
ECoT does not require additional robot training data.
Abstract
A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities of large vision-language models in other domains is their ability to reason iteratively through complex problems. Can that same capability be brought into robotics to allow policies to improve performance by reasoning about a given task before acting? Naive use of "chain-of-thought" (CoT) style prompting is significantly less effective with standard VLAs because of the relatively simple training examples that are available to them. Additionally, purely semantic reasoning about sub-tasks, as is…
Peer Reviews
Decision·CoRL 2024
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Embodied and Extended Cognition
