A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel, Xinchen Yin, Wenlong Huang, Shubham Garg, Hooshang, Nayyeri, Li Fei-Fei, Svetlana Lazebnik, Yunzhu Li

TL;DR
This paper presents IKER, a novel visual reward function generated by vision-language models, enabling robots to learn complex multi-step manipulation tasks through a real-to-sim-to-real loop with iterative feedback.
Contribution
We introduce IKER, a dynamic, visually grounded reward function that leverages VLMs for iterative task specification and reinforcement learning in robotic manipulation.
Findings
IKER enables multi-step manipulation in diverse scenarios.
Robots can perform error recovery and strategy adjustments.
The approach improves task success in dynamic environments.
Abstract
Task specification for robotic manipulation in open-world environments is challenging, requiring flexible and adaptive objectives that align with human intentions and can evolve through iterative feedback. We introduce Iterative Keypoint Reward (IKER), a visually grounded, Python-based reward function that serves as a dynamic task specification. Our framework leverages VLMs to generate and refine these reward functions for multi-step manipulation tasks. Given RGB-D observations and free-form language instructions, we sample keypoints in the scene and generate a reward function conditioned on these keypoints. IKER operates on the spatial relationships between keypoints, leveraging commonsense priors about the desired behaviors, and enabling precise SE(3) control. We reconstruct real-world scenes in simulation and use the generated rewards to train reinforcement learning (RL) policies,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
MethodsALIGN
