LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim,, Karl Pertsch

TL;DR
This paper introduces LiFT, a framework where foundation models serve as teachers to guide reinforcement learning agents in acquiring meaningful multi-task skills without human feedback, demonstrating success in complex environments.
Contribution
LiFT is the first framework to utilize foundation models as teachers for unsupervised reinforcement learning, enabling semantic skill acquisition in open-ended environments.
Findings
Successfully learned semantically meaningful skills in MineDojo environment
Outperformed prior unsupervised skill discovery methods
Identified challenges of using off-the-shelf foundation models as teachers
Abstract
We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by providing reward feedback. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment while prior unsupervised skill discovery methods struggle. Additionally, we discuss observed challenges of using off-the-shelf foundation models as teachers and our efforts to address them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
