Mixture-of-Experts Meets In-Context Reinforcement Learning
Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang

TL;DR
This paper introduces T2MIR, a novel transformer architecture incorporating token-wise and task-wise mixture-of-experts to improve in-context reinforcement learning across diverse tasks and data modalities.
Contribution
The paper proposes T2MIR, an innovative MoE-based framework that enhances in-context RL by capturing multi-modal semantics and managing task heterogeneity with contrastive routing.
Findings
T2MIR significantly improves in-context learning performance.
It outperforms various baseline models.
The approach effectively handles diverse decision tasks.
Abstract
In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
