Mixture-of-Experts Meets In-Context Reinforcement Learning

Wenhao Wu; Fuhong Liu; Haoru Li; Zican Hu; Daoyi Dong; Chunlin Chen; Zhi Wang

arXiv:2506.05426·cs.LG·October 29, 2025

Mixture-of-Experts Meets In-Context Reinforcement Learning

Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang

PDF

Open Access 1 Models

TL;DR

This paper introduces T2MIR, a novel transformer architecture incorporating token-wise and task-wise mixture-of-experts to improve in-context reinforcement learning across diverse tasks and data modalities.

Contribution

The paper proposes T2MIR, an innovative MoE-based framework that enhances in-context RL by capturing multi-modal semantics and managing task heterogeneity with contrastive routing.

Findings

01

T2MIR significantly improves in-context learning performance.

02

It outperforms various baseline models.

03

The approach effectively handles diverse decision tasks.

Abstract

In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Wenhao0/T2MIR
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning