Guide Your Agent with Adaptive Multimodal Rewards
Changyeon Kim, Younggyo Seo, Hao Liu, Lisa Lee, Jinwoo Shin, Honglak, Lee, Kimin Lee

TL;DR
This paper introduces ARP, a framework that uses multimodal embeddings and natural language instructions to improve agent generalization in unseen environments, effectively addressing goal misgeneralization in imitation learning.
Contribution
The paper proposes a novel adaptive reward mechanism using pre-trained multimodal encoders and fine-tuning to enhance agent generalization with natural language instructions.
Findings
ARP outperforms existing text-conditioned policies in unseen environments.
Multimodal reward signals improve agent adaptation and goal achievement.
Fine-tuning encoders further boosts performance.
Abstract
Developing an agent capable of adapting to unseen environments remains a difficult challenge in imitation learning. This work presents Adaptive Return-conditioned Policy (ARP), an efficient framework designed to enhance the agent's generalization ability using natural language task descriptions and pre-trained multimodal encoders. Our key idea is to calculate a similarity between visual observations and natural language instructions in the pre-trained multimodal embedding space (such as CLIP) and use it as a reward signal. We then train a return-conditioned policy using expert demonstrations labeled with multimodal rewards. Because the multimodal rewards provide adaptive signals at each timestep, our ARP effectively mitigates the goal misgeneralization. This results in superior generalization performances even when faced with unseen text instructions, compared to existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
