Effective Multimodal Reinforcement Learning with Modality Alignment and   Importance Enhancement

Jinming Ma; Feng Wu; Yingfeng Chen; Xianpeng Ji; Yu Ding

arXiv:2302.09318·cs.LG·February 21, 2023·1 cites

Effective Multimodal Reinforcement Learning with Modality Alignment and Importance Enhancement

Jinming Ma, Feng Wu, Yingfeng Chen, Xianpeng Ji, Yu Ding

PDF

Open Access

TL;DR

This paper introduces a novel multimodal reinforcement learning method that aligns modalities and emphasizes their importance, leading to improved state representation and better policy learning in complex environments.

Contribution

It proposes a new approach for multimodal RL that addresses heterogeneity and importance variability through modality alignment and importance enhancement.

Findings

01

Outperforms state-of-the-art methods in learning speed

02

Achieves higher policy quality in multimodal tasks

03

Enhances state representation learning in multimodal environments

Abstract

Many real-world applications require an agent to make robust and deliberate decisions with multimodal information (e.g., robots with multi-sensory inputs). However, it is very challenging to train the agent via reinforcement learning (RL) due to the heterogeneity and dynamic importance of different modalities. Specifically, we observe that these issues make conventional RL methods difficult to learn a useful state representation in the end-to-end training with multimodal information. To address this, we propose a novel multimodal RL approach that can do multimodal alignment and importance enhancement according to their similarity and importance in terms of RL tasks respectively. By doing so, we are able to learn an effective state representation and consequentially improve the RL training process. We test our approach on several multimodal RL domains, showing that it outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsTest · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings