${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

Physical Intelligence; Bo Ai; Ali Amin; Raichelle Aniceto; Ashwin Balakrishna; Greg Balke; Kevin Black; George Bokinsky; Shihao Cao; Thomas Charbonnier; Vedant Choudhary; Foster Collins; Ken Conley; Grace Connors; James Darpinian; Karan Dhabalia; Maitrayee Dhaka; Jared DiCarlo; Danny Driess; Michael Equi; Adnan Esmail; Yunhao Fang; Chelsea Finn; Catherine Glossop; Thomas Godden; Ivan Goryachev; Lachlan Groom; Haroun Habeeb; Hunter Hancock; Karol Hausman; Gashon Hussein; Victor Hwang; Brian Ichter; Connor Jacobsen; Szymon Jakubczak; Rowan Jen; Tim Jones; Gregg Kammerer; Ben Katz; Liyiming Ke; Mairbek Khadikov; Chandra Kuchi; Marinda Lamb; Devin LeBlanc; Brendon LeCount; Sergey Levine; Xinyu Li; Adrian Li-Bell; Vladislav Lialin; Zhonglin Liang; Wallace Lim; Yao Lu; Enyu Luo; Vishnu Mano; Nandan Marwaha; Aikys Mongush; Liam Murphy; Suraj Nair; Tyler Patterson; Karl Pertsch; Allen Z. Ren; Gavin Schelske; Charvi Sharma; Baifeng Shi; Lucy Xiaoyang Shi; Laura Smith; Jost Tobias Springenberg; Kyle Stachowicz; Will Stoeckle; Jiaming Tang; Jimmy Tanner; Shalom Tekeste; Marcel Torne; Kyle Vedder; Quan Vuong; Anna Walling; Haohuan Wang; Jason Wang; XuDong Wang; Chris Whalen; Samuel Whitmore; Blake Williams; Charles Xu; Sukwon Yoo; Lili Yu; Wuming Zhang; Zhuoyang Zhang; Ury Zhilinsky

arXiv:2604.15483·cs.LG·April 28, 2026

${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo

PDF

TL;DR

${\pi}_{0.7}$ is a versatile robotic foundation model capable of zero-shot generalization, following complex language instructions, and performing diverse tasks across different environments by leveraging diverse context conditioning.

Contribution

Introduces ${\pi}_{0.7}$, a robotic foundation model that uses multimodal context conditioning to enable out-of-the-box, multi-task, and cross-embodiment robotic capabilities.

Findings

01

Achieves performance comparable to specialized models on complex tasks.

02

Enables zero-shot generalization to unseen tasks and environments.

03

Performs well across multiple robot platforms and task types.

Abstract

We present a new robotic foundation model, called $π_{0.7}$ , that can enable strong out-of-the-box performance in a wide range of scenarios. $π_{0.7}$ can follow diverse language instructions in unseen environments, including multi-stage tasks with various kitchen appliances, provide zero-shot cross-embodiment generalization, for example enabling a robot to fold laundry without seeing the task before, and perform challenging tasks such as operating an espresso machine out of the box at a level of performance that matches much more specialized RL-finetuned models. The main idea behind $π_{0.7}$ is to use diverse context conditioning during training. This conditioning information, contained in the prompt, makes it possible to steer the model precisely to perform many tasks with different strategies. It is conditioned not just on a language command that describes what it should…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.