Robotic Control via Embodied Chain-of-Thought Reasoning

Micha{\l} Zawalski; William Chen; Karl Pertsch; Oier Mees and; Chelsea Finn; Sergey Levine

arXiv:2407.08693·cs.RO·March 10, 2025·1 cites

Robotic Control via Embodied Chain-of-Thought Reasoning

Micha{\l} Zawalski, William Chen, Karl Pertsch, Oier Mees and, Chelsea Finn, Sergey Levine

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Embodied Chain-of-Thought Reasoning (ECoT), a method enabling vision-language-action models to perform multi-step, grounded reasoning about tasks and observations, significantly improving robot control robustness and interpretability.

Contribution

The paper presents ECoT, a novel training pipeline for VLAs that incorporates multi-step, grounded reasoning, enhancing generalization and interpretability without extra robot data.

Findings

01

ECoT increases success rate of VLA policies by 28% on challenging tasks.

02

ECoT improves interpretability and error correction of robot policies.

03

ECoT does not require additional robot training data.

Abstract

A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities of large vision-language models in other domains is their ability to reason iteratively through complex problems. Can that same capability be brought into robotics to allow policies to improve performance by reasoning about a given task before acting? Naive use of "chain-of-thought" (CoT) style prompting is significantly less effective with standard VLAs because of the relatively simple training examples that are available to them. Additionally, purely semantic reasoning about sub-tasks, as is…

Peer Reviews

Decision·CoRL 2024

Reviewer 01Rating 4Confidence 4

Reviewer 02Rating 4Confidence 4

Reviewer 03Rating 4Confidence 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Embodied and Extended Cognition