${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo

TL;DR
${\pi}_{0.7}$ is a versatile robotic foundation model capable of zero-shot generalization, following complex language instructions, and performing diverse tasks across different environments by leveraging diverse context conditioning.
Contribution
Introduces ${\pi}_{0.7}$, a robotic foundation model that uses multimodal context conditioning to enable out-of-the-box, multi-task, and cross-embodiment robotic capabilities.
Findings
Achieves performance comparable to specialized models on complex tasks.
Enables zero-shot generalization to unseen tasks and environments.
Performs well across multiple robot platforms and task types.
Abstract
We present a new robotic foundation model, called , that can enable strong out-of-the-box performance in a wide range of scenarios. can follow diverse language instructions in unseen environments, including multi-stage tasks with various kitchen appliances, provide zero-shot cross-embodiment generalization, for example enabling a robot to fold laundry without seeing the task before, and perform challenging tasks such as operating an espresso machine out of the box at a level of performance that matches much more specialized RL-finetuned models. The main idea behind is to use diverse context conditioning during training. This conditioning information, contained in the prompt, makes it possible to steer the model precisely to perform many tasks with different strategies. It is conditioned not just on a language command that describes what it should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
