Learning Diverse Skills for Behavior Models with Mixture of Experts

Wangtian Shen; Jinming Ma; Mingliang Zhou; Ziyang Meng

arXiv:2601.12397·cs.RO·January 21, 2026

Learning Diverse Skills for Behavior Models with Mixture of Experts

Wangtian Shen, Jinming Ma, Mingliang Zhou, Ziyang Meng

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Di-BM, a mixture of experts approach for imitation learning that enables behavior models to learn diverse, specialized skills, improving multi-task performance and data efficiency in robotic manipulation.

Contribution

The paper proposes Di-BM, a novel mixture of experts framework with energy-based models for multi-task imitation learning, enhancing specialization and transferability of skills.

Findings

01

Di-BM outperforms state-of-the-art baselines on robotic tasks.

02

Fine-tuning Di-BM on new tasks is more data-efficient.

03

Experts specialize in different observation sub-regions, reducing interference.

Abstract

Imitation learning has demonstrated strong performance in robotic manipulation by learning from large-scale human demonstrations. While existing models excel at single-task learning, it is observed in practical applications that their performance degrades in the multi-task setting, where interference across tasks leads to an averaging effect. To address this issue, we propose to learn diverse skills for behavior models with Mixture of Experts, referred to as Di-BM. Di-BM associates each expert with a distinct observation distribution, enabling experts to specialize in sub-regions of the observation space. Specifically, we employ energy-based models to represent expert-specific observation distributions and jointly train them alongside the corresponding action models. Our approach is plug-and-play and can be seamlessly integrated into standard imitation learning methods. Extensive…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- The MoE framework introduced in this paper can learn task assignment autonomously, enhancing the potential to learn from large-scale, unstructured multi-task datasets. - Experimental results on real-robot tasks demonstrate its effectiveness. Qualitative visualizations show that the models do learn some meaningful task assignments. - The paper writing is well organized and easy to follow.

Weaknesses

- Limited technical contributions. After reading the method sections, it seems that most of techniques are adopted from the two prior works (Celik et al., 2022; 2024). The main difference is changing their reinforcement learning setting to the imitation learning setting. - Lack of reproducible simulation experiments. There are a lot of multi-task imitation learning benchmarks in simulation that are widely used in prior robotic imitation learning research, like Meta-World, Libero, RoboCasa, and R

Reviewer 02Rating 4Confidence 3

Strengths

- The paper is well-written. The reader can easily follow the story of the paper and understand the motivation behind the proposed methods - The benefits of choosing an MoE policy representation are well-grounded by intuitive figures (e.g., Fig.3) on the training data - The method is validated on real-robot experiments, emphasizing its strengths

Weaknesses

- The work lacks important related works that have employed similar ideas on learning parameterized distributions over the input (observation) space [1,2]. How does the proposed method algorithmically differ from these methods? - Better description of the data set; i.e., which tasks are included? Are different robot types used? .... It's hard to infer the "difficulty" of a task. Commenting on the task difficulty could help the reader. - Although I appreciate the real-robot experiments, I belie

Reviewer 03Rating 4Confidence 4

Strengths

- Gating visualisations nicely show that the model utilises different experts - The methods shows strong empirical results, showing improvement on several real-world robotic tasks, verified through ablations and visualizations - The method can be incorporated seamlessly into existing imitation learning architectures

Weaknesses

- The paper does not mention related work that uses very similar methodology and goals, namely [1] and [2]. - In [1] they show that the optimal gating can be computed in closed form, making it unnecessary to learn a model in every iteration but it is sufficient to only learn a gating at the end of training. What benefit do the authors see when learning the gating? - Additionally, in [1] the authors establish convergence guarantees from an expectation-maximisation perspective. Do the authors

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI