CLIP-RL: Aligning Language and Policy Representations for Task Transfer in Reinforcement Learning
Chainesh Gautam, Raghuram Bharadwaj Diddigi

TL;DR
This paper introduces CLIP-RL, a method that aligns language and policy representations in reinforcement learning to enable efficient transfer across multiple tasks by creating a unified embedding space.
Contribution
It extends CLIP principles to reinforcement learning, establishing a shared representation space for language and policies to improve task transfer efficiency.
Findings
Faster transfer across tasks demonstrated
Unified representation space created for language and policies
Inspired by CLIP, adapted for RL context
Abstract
Recently, there has been an increasing need to develop agents capable of solving multiple tasks within the same environment, especially when these tasks are naturally associated with language. In this work, we propose a novel approach that leverages combinations of pre-trained (language, policy) pairs to establish an efficient transfer pipeline. Our algorithm is inspired by the principles of Contrastive Language-Image Pretraining (CLIP) in Computer Vision, which aligns representations across different modalities under the philosophy that ''two modalities representing the same concept should have similar representations.'' The central idea here is that the instruction and corresponding policy of a task represent the same concept, the task itself, in two different modalities. Therefore, by extending the idea of CLIP to RL, our method creates a unified representation space for natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics
