Training Strategies for Efficient Embodied Reasoning

William Chen; Suneel Belkhale; Suvir Mirchandani; Oier Mees; Danny Driess; Karl Pertsch; Sergey Levine

arXiv:2505.08243·cs.RO·May 20, 2025

Training Strategies for Efficient Embodied Reasoning

William Chen, Suneel Belkhale, Suvir Mirchandani, Oier Mees, Danny Driess, Karl Pertsch, Sergey Levine

PDF

1 Datasets

TL;DR

This paper investigates why robot chain-of-thought reasoning improves policy performance, identifies key mechanisms, and proposes lightweight alternatives that enhance efficiency and accuracy in vision-language-action models.

Contribution

It provides a mechanistic understanding of robot reasoning benefits and introduces simple, efficient reasoning strategies that outperform existing methods.

Findings

01

Reasoning improves representation learning and action prediction.

02

Attending to generated reasonings enhances policy performance.

03

Proposed methods achieve state-of-the-art results and faster inference.

Abstract

Robot chain-of-thought reasoning (CoT) -- wherein a model predicts helpful intermediate representations before choosing actions -- provides an effective method for improving the generalization and performance of robot policies, especially vision-language-action models (VLAs). While such approaches have been shown to improve performance and generalization, they suffer from core limitations, like needing specialized robot reasoning data and slow inference speeds. To design new robot reasoning approaches that address these issues, a more complete characterization of why reasoning helps policy performance is critical. We hypothesize several mechanisms by which robot reasoning improves policies -- (1) better representation learning, (2) improved learning curricularization, and (3) increased expressivity -- then devise simple variants of robot CoT reasoning to isolate and test each one. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Embodied-CoT/embodied_features_and_demos_libero
dataset· 537 dl
537 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.