MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Tianyu Zheng, Ge Zhang, Xingwei Qu, Ming Kuang, Stephen W. Huang, and, Zhaofeng He

TL;DR
This paper introduces MORE-3S, a multimodal offline reinforcement learning method that aligns visual and textual data into a shared semantic space, improving decision-making and strategic planning in RL tasks.
Contribution
It presents a novel approach that transforms offline RL into a supervised learning problem using multimodal and pre-trained language models for better state and action understanding.
Findings
Outperforms existing baselines on Atari and OpenAI Gym environments.
Enhances RL training performance through multimodal semantic alignment.
Promotes long-term strategic thinking in RL agents.
Abstract
Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
