GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy
Peiyan Li, Hongtao Wu, Yan Huang, Chilam Cheang, Liang Wang, Tao Kong

TL;DR
GR-MG leverages partially-annotated data by conditioning on text and goal images, using diffusion-based image editing to improve robot manipulation generalization with fewer annotations.
Contribution
Introduces GR-MG, a novel method that combines text instructions and goal images, including generated images, to enhance robot learning from partially-annotated data.
Findings
Improves task completion rate from 3.35 to 4.04 in simulation.
Increases success rate from 68.7% to 78.1% in real robots.
Outperforms baseline methods in few-shot skill learning.
Abstract
The robotics community has consistently aimed to achieve generalizable robot manipulation with flexible natural language instructions. One primary challenge is that obtaining robot trajectories fully annotated with both actions and texts is time-consuming and labor-intensive. However, partially-annotated data, such as human activity videos without action labels and robot trajectories without text labels, are much easier to collect. Can we leverage these data to enhance the generalization capabilities of robots? In this paper, we propose GR-MG, a novel method which supports conditioning on a text instruction and a goal image. During training, GR-MG samples goal images from trajectories and conditions on both the text and the goal image or solely on the image when text is not available. During inference, where only the text is provided, GR-MG generates the goal image via a diffusion-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Bayesian Modeling and Causal Inference · Simulation Techniques and Applications
