Action-Free Reasoning for Policy Generalization
Jaden Clark, Suvir Mirchandani, Dorsa Sadigh, Suneel Belkhale

TL;DR
This paper introduces RAD, a reasoning-based approach that leverages abundant action-free human videos with reasoning annotations to train robot policies, improving generalization across embodiment gaps and reducing reliance on resource-intensive demonstration data.
Contribution
The paper proposes a novel reasoning-based framework, RAD, that learns from both robot demonstrations and action-free human videos, enabling better policy transfer and generalization in robotics.
Findings
RAD improves transfer across embodiment gaps.
Scaling action-free data enhances policy performance.
Reasoning-driven learning enables generalization to new tasks.
Abstract
End-to-end imitation learning offers a promising approach for training robot policies. However, generalizing to new settings remains a significant challenge. Although large-scale robot demonstration datasets have shown potential for inducing generalization, they are resource-intensive to scale. In contrast, human video data is abundant and diverse, presenting an attractive alternative. Yet, these human-video datasets lack action labels, complicating their use in imitation learning. Existing methods attempt to extract grounded action representations (e.g., hand poses), but resulting policies struggle to bridge the embodiment gap between human and robot actions. We propose an alternative approach: leveraging language-based reasoning from human videos-essential for guiding robot actions-to train generalizable robot policies. Building on recent advances in reasoning-based policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge
