TL;DR
This paper introduces LVLM2P, a framework that distills large vision-language models into reinforcement learning agents, significantly improving sample efficiency and enabling more practical deployment in resource-constrained environments.
Contribution
The novel LVLM2P framework leverages large vision-language models as teachers to accelerate RL training and eliminate manual environment descriptions, enhancing efficiency and applicability.
Findings
LVLM2P significantly improves sample efficiency of RL algorithms.
The approach reduces early exploration inefficiencies.
It enables RL agents to operate without manual environment annotations.
Abstract
Recent research highlights the potential of multimodal foundation models in tackling complex decision-making challenges. However, their large parameters make real-world deployment resource-intensive and often impractical for constrained systems. Reinforcement learning (RL) shows promise for task-specific agents but suffers from high sample complexity, limiting practical applications. To address these challenges, we introduce LVLM to Policy (LVLM2P), a novel framework that distills knowledge from large vision-language models (LVLM) into more efficient RL agents. Our approach leverages the LVLM as a teacher, providing instructional actions based on trajectories collected by the RL agent, which helps reduce less meaningful exploration in the early stages of learning, thereby significantly accelerating the agent's learning progress. Additionally, by leveraging the LVLM to suggest actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
