STEER: Flexible Robotic Manipulation via Dense Language Grounding
Laura Smith, Alex Irpan, Montserrat Gonzalez Arenas, Sean Kirmani,, Dmitry Kalashnikov, Dhruv Shah, Ted Xiao

TL;DR
STEER is a robotic framework that combines high-level language reasoning with low-level control, enabling flexible adaptation to new tasks and situations through modular, language-grounded policies.
Contribution
It introduces a modular, language-grounded policy training approach that allows robots to adapt to unseen tasks without additional data or retraining.
Findings
Robots can synthesize new behaviors by combining learned skills.
The framework enables adaptation to new tasks without extra training.
Language grounding improves task generalization.
Abstract
The complexity of the real world demands robotic systems that can intelligently adapt to unseen situations. We present STEER, a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control. Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation. By structuring policy training around fundamental, modular manipulation skills expressed in natural language, STEER exposes an expressive interface for humans or Vision-Language Models (VLMs) to intelligently orchestrate the robot's behavior by reasoning about the task and context. Our experiments demonstrate the skills learned via STEER can be combined to synthesize novel behaviors to adapt to new situations or perform completely new tasks without additional data collection or training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
