Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation

Donghoon Lee; Tung M. Luu; Younghwan Lee; Chang D. Yoo

arXiv:2505.11221·cs.LG·May 19, 2025

Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation

Donghoon Lee, Tung M. Luu, Younghwan Lee, Chang D. Yoo

PDF

1 Repo

TL;DR

This paper introduces LVLM2P, a framework that distills large vision-language models into reinforcement learning agents, significantly improving sample efficiency and enabling more practical deployment in resource-constrained environments.

Contribution

The novel LVLM2P framework leverages large vision-language models as teachers to accelerate RL training and eliminate manual environment descriptions, enhancing efficiency and applicability.

Findings

01

LVLM2P significantly improves sample efficiency of RL algorithms.

02

The approach reduces early exploration inefficiencies.

03

It enables RL agents to operate without manual environment annotations.

Abstract

Recent research highlights the potential of multimodal foundation models in tackling complex decision-making challenges. However, their large parameters make real-world deployment resource-intensive and often impractical for constrained systems. Reinforcement learning (RL) shows promise for task-specific agents but suffers from high sample complexity, limiting practical applications. To address these challenges, we introduce LVLM to Policy (LVLM2P), a novel framework that distills knowledge from large vision-language models (LVLM) into more efficient RL agents. Our approach leverages the LVLM as a teacher, providing instructional actions based on trajectories collected by the RL agent, which helps reduce less meaningful exploration in the early stages of learning, thereby significantly accelerating the agent's learning progress. Additionally, by leveraging the LVLM to suggest actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

i22024/lvlm2p
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.