Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic   Rewards via Failure Prompts

Yanting Yang; Minghao Chen; Qibo Qiu; Jiahao Wu; Wenxiao Wang; Binbin; Lin; Ziyu Guan; Xiaofei He

arXiv:2407.14872·cs.CV·July 23, 2024

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang, Minghao Chen, Qibo Qiu, Jiahao Wu, Wenxiao Wang, Binbin, Lin, Ziyu Guan, Xiaofei He

PDF

Open Access

TL;DR

This paper introduces Adapt2Reward, a method that leverages failure prompts and clustering of failure videos to create a generalizable language-conditioned reward function for robots, enabling better adaptation to new environments and instructions.

Contribution

It presents a novel approach to transfer vision-language models into reward functions using minimal task data and failure clustering, improving robotic generalization.

Findings

01

Outperforms existing reward models in new environments

02

Effective in distinguishing success and failure modes

03

Enhances robot planning and reinforcement learning

Abstract

For a general-purpose robot to operate in reality, executing a broad range of instructions across various environments is imperative. Central to the reinforcement learning and planning for such robotic agents is a generalizable reward function. Recent advances in vision-language models, such as CLIP, have shown remarkable performance in the domain of deep learning, paving the way for open-domain visual recognition. However, collecting data on robots executing various language instructions across multiple environments remains a challenge. This paper aims to transfer video-language models with robust generalization into a generalizable language-conditioned reward function, only utilizing robot video data from a minimal amount of tasks in a singular environment. Unlike common robotic datasets used for training reward functions, human video-language datasets rarely contain trivial failure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training