Learning Diverse Skills for Behavior Models with Mixture of Experts
Wangtian Shen, Jinming Ma, Mingliang Zhou, Ziyang Meng

TL;DR
This paper introduces Di-BM, a mixture of experts approach for imitation learning that enables behavior models to learn diverse, specialized skills, improving multi-task performance and data efficiency in robotic manipulation.
Contribution
The paper proposes Di-BM, a novel mixture of experts framework with energy-based models for multi-task imitation learning, enhancing specialization and transferability of skills.
Findings
Di-BM outperforms state-of-the-art baselines on robotic tasks.
Fine-tuning Di-BM on new tasks is more data-efficient.
Experts specialize in different observation sub-regions, reducing interference.
Abstract
Imitation learning has demonstrated strong performance in robotic manipulation by learning from large-scale human demonstrations. While existing models excel at single-task learning, it is observed in practical applications that their performance degrades in the multi-task setting, where interference across tasks leads to an averaging effect. To address this issue, we propose to learn diverse skills for behavior models with Mixture of Experts, referred to as Di-BM. Di-BM associates each expert with a distinct observation distribution, enabling experts to specialize in sub-regions of the observation space. Specifically, we employ energy-based models to represent expert-specific observation distributions and jointly train them alongside the corresponding action models. Our approach is plug-and-play and can be seamlessly integrated into standard imitation learning methods. Extensive…
Peer Reviews
Decision·Submitted to ICLR 2026
- The MoE framework introduced in this paper can learn task assignment autonomously, enhancing the potential to learn from large-scale, unstructured multi-task datasets. - Experimental results on real-robot tasks demonstrate its effectiveness. Qualitative visualizations show that the models do learn some meaningful task assignments. - The paper writing is well organized and easy to follow.
- Limited technical contributions. After reading the method sections, it seems that most of techniques are adopted from the two prior works (Celik et al., 2022; 2024). The main difference is changing their reinforcement learning setting to the imitation learning setting. - Lack of reproducible simulation experiments. There are a lot of multi-task imitation learning benchmarks in simulation that are widely used in prior robotic imitation learning research, like Meta-World, Libero, RoboCasa, and R
- The paper is well-written. The reader can easily follow the story of the paper and understand the motivation behind the proposed methods - The benefits of choosing an MoE policy representation are well-grounded by intuitive figures (e.g., Fig.3) on the training data - The method is validated on real-robot experiments, emphasizing its strengths
- The work lacks important related works that have employed similar ideas on learning parameterized distributions over the input (observation) space [1,2]. How does the proposed method algorithmically differ from these methods? - Better description of the data set; i.e., which tasks are included? Are different robot types used? .... It's hard to infer the "difficulty" of a task. Commenting on the task difficulty could help the reader. - Although I appreciate the real-robot experiments, I belie
- Gating visualisations nicely show that the model utilises different experts - The methods shows strong empirical results, showing improvement on several real-world robotic tasks, verified through ablations and visualizations - The method can be incorporated seamlessly into existing imitation learning architectures
- The paper does not mention related work that uses very similar methodology and goals, namely [1] and [2]. - In [1] they show that the optimal gating can be computed in closed form, making it unnecessary to learn a model in every iteration but it is sufficient to only learn a gating at the end of training. What benefit do the authors see when learning the gating? - Additionally, in [1] the authors establish convergence guarantees from an expectation-maximisation perspective. Do the authors
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI
