A Real-to-Sim-to-Real Approach to Robotic Manipulation with   VLM-Generated Iterative Keypoint Rewards

Shivansh Patel; Xinchen Yin; Wenlong Huang; Shubham Garg; Hooshang; Nayyeri; Li Fei-Fei; Svetlana Lazebnik; Yunzhu Li

arXiv:2502.08643·cs.RO·February 19, 2025

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

Shivansh Patel, Xinchen Yin, Wenlong Huang, Shubham Garg, Hooshang, Nayyeri, Li Fei-Fei, Svetlana Lazebnik, Yunzhu Li

PDF

Open Access

TL;DR

This paper presents IKER, a novel visual reward function generated by vision-language models, enabling robots to learn complex multi-step manipulation tasks through a real-to-sim-to-real loop with iterative feedback.

Contribution

We introduce IKER, a dynamic, visually grounded reward function that leverages VLMs for iterative task specification and reinforcement learning in robotic manipulation.

Findings

01

IKER enables multi-step manipulation in diverse scenarios.

02

Robots can perform error recovery and strategy adjustments.

03

The approach improves task success in dynamic environments.

Abstract

Task specification for robotic manipulation in open-world environments is challenging, requiring flexible and adaptive objectives that align with human intentions and can evolve through iterative feedback. We introduce Iterative Keypoint Reward (IKER), a visually grounded, Python-based reward function that serves as a dynamic task specification. Our framework leverages VLMs to generate and refine these reward functions for multi-step manipulation tasks. Given RGB-D observations and free-form language instructions, we sample keypoints in the scene and generate a reward function conditioned on these keypoints. IKER operates on the spatial relationships between keypoints, leveraging commonsense priors about the desired behaviors, and enabling precise SE(3) control. We reconstruct real-world scenes in simulation and use the generated rewards to train reinforcement learning (RL) policies,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms

MethodsALIGN