Language-Model-Assisted Bi-Level Programming for Reward Learning from   Internet Videos

Harsh Mahesheka; Zhixian Xie; Zhaoran Wang; Wanxin Jin

arXiv:2410.09286·cs.RO·October 15, 2024

Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Harsh Mahesheka, Zhixian Xie, Zhaoran Wang, Wanxin Jin

PDF

Open Access

TL;DR

This paper introduces a novel bi-level programming framework that uses language models and vision-language models to learn reward functions directly from internet videos, simplifying the process of reward learning for complex behaviors.

Contribution

It presents a new framework combining VLMs and LLMs for direct reward learning from videos, bypassing complex data extraction pipelines.

Findings

01

Effective reward learning from YouTube videos demonstrated

02

Enables complex behavior synthesis with biological expert videos

03

Streamlines reward design process for reinforcement learning

Abstract

Learning from Demonstrations, particularly from biological experts like humans and animals, often encounters significant data acquisition challenges. While recent approaches leverage internet videos for learning, they require complex, task-specific pipelines to extract and retarget motion data for the agent. In this work, we introduce a language-model-assisted bi-level programming framework that enables a reinforcement learning agent to directly learn its reward from internet videos, bypassing dedicated data preparation. The framework includes two levels: an upper level where a vision-language model (VLM) provides feedback by comparing the learner's behavior with expert videos, and a lower level where a large language model (LLM) translates this feedback into reward updates. The VLM and LLM collaborate within this bi-level framework, using a "chain rule" approach to derive a valid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Reinforcement Learning in Robotics