$\pi^{*}_{0.6}$: a VLA That Learns From Experience

Physical Intelligence; Ali Amin; Raichelle Aniceto; Ashwin Balakrishna; Kevin Black; Ken Conley; Grace Connors; James Darpinian; Karan Dhabalia; Jared DiCarlo; Danny Driess; Michael Equi; Adnan Esmail; Yunhao Fang; Chelsea Finn; Catherine Glossop; Thomas Godden; Ivan Goryachev; Lachy Groom; Hunter Hancock; Karol Hausman; Gashon Hussein; Brian Ichter; Szymon Jakubczak; Rowan Jen; Tim Jones; Ben Katz; Liyiming Ke; Chandra Kuchi; Marinda Lamb; Devin LeBlanc; Sergey Levine; Adrian Li-Bell; Yao Lu; Vishnu Mano; Mohith Mothukuri; Suraj Nair; Karl Pertsch; Allen Z. Ren; Charvi Sharma; Lucy Xiaoyang Shi; Laura Smith; Jost Tobias Springenberg; Kyle Stachowicz; Will Stoeckle; Alex Swerdlow; James Tanner; Marcel Torne; Quan Vuong; Anna Walling; Haohuan Wang; Blake Williams; Sukwon Yoo; Lili Yu; Ury Zhilinsky; Zhiyuan Zhou

arXiv:2511.14759·cs.LG·November 20, 2025

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine Glossop, Thomas Godden, Ivan Goryachev

PDF

Open Access

TL;DR

This paper introduces RECAP, a reinforcement learning method that improves vision-language-action models through real-world experience, enabling robots to perform complex tasks like laundry and espresso making with higher efficiency and reliability.

Contribution

The paper presents RECAP, a novel RL approach that integrates heterogeneous data for training and fine-tuning a generalist VLA model called $ ext{pi}^*_{0.6}$ for real-world robotic tasks.

Findings

01

RECAP significantly increases task throughput.

02

RECAP reduces task failure rates.

03

The $ ext{pi}^*_{0.6}$ model successfully performs complex household tasks.

Abstract

We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $π_{0.6}^{*}$ , that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $π_{0.6}^{*}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning