Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
Ce Hao, Xuanran Zhai, Yaohua Liu, Harold Soh

TL;DR
This paper introduces SMP, a diffusion-based mixture-of-experts policy for robot manipulation that efficiently learns and composes skills, enabling scalable multi-task learning with high success and low inference cost.
Contribution
The paper proposes a novel SMP framework that uses a mixture-of-experts approach with sticky routing and variational training for scalable multi-task robot manipulation.
Findings
SMP achieves higher success rates than large diffusion baselines.
SMP has significantly lower inference costs.
SMP demonstrates effective transfer learning in real robot experiments.
Abstract
Diffusion-based policies have recently shown strong results in robot manipulation, but their extension to multi-task scenarios is hindered by the high cost of scaling model size and demonstrations. We introduce Skill Mixture-of-Experts Policy (SMP), a diffusion-based mixture-of-experts policy that learns a compact orthogonal skill basis and uses sticky routing to compose actions from a small, task-relevant subset of experts at each step. A variational training objective supports this design, and adaptive expert activation at inference yields fast sampling without oversized backbones. We validate SMP in simulation and on a real dual-arm platform with multi-task learning and transfer learning tasks, where SMP achieves higher success rates and markedly lower inference cost than large diffusion baselines. These results indicate a practical path toward scalable, transferable multi-task…
Peer Reviews
Decision·ICLR 2026 Poster
The paper presents an intuitive method for learning skills implicitly for multi-task scenarios. The engineering in choosing the suitable method to construct the orthonormal basis, training targets for skill-specific diffusion experts, and sticky gating function is well executed.
Overall the major concern I have is that the paper do not provide any ablation results for the specific design choices like: 1. How does the results change without the sticky gating function? Since this is one of the major contributions, it will be worth looking at how this imapcts simpler MoEs like Sparse DP. 2. How does the results change with k or the mass threshold? Linear mass vs quadratice mass? 3. The practical implementation subsection in the appendix suggests that the proposed method r
1. The paper is well-organized and easy to follow. 2. The idea of orthogonal skill basis seems very attractive, based on the ablation studies it does provide good cross-task skill reuse with few switches. 3. The ablation studies are abundant and helpful in supporting the claimed contributions. 4. Real-world experiments looks good.
1. I am curious about the training cost of learning good orthonormal skill basis, more details on this part will be appreciated. 2. In Figure 3 task "Put bread into skillet" and "Lift tray with block in it", there are still considerable portion of the trajectories with multiple experts activated at the same time with similar gate values, does it mean in these cases the experts are still overlapping?
1. This paper proposes learning a lightweight basis in the action space and introduces a novel method for learning the gating mechanism. The experiments demonstrate the effectiveness of this approach in terms of success rate and inference computation (activated parameters). The paper also presents explainable and stable routing phase transitions, such as rotation and translation. 2. This paper exhibits several appealing features resulting from its decoupled structure (action basis, routing, and
1. The paper claims that “a fixed basis may fail to capture such variability.” However, there is no comparison provided between a fixed basis and a learned basis. Moreover, from Figures 2 and 3, the action bases appear to be fixed and simple—such as left/right translation and rotation—suggesting that a fixed basis might be sufficient. How does the learned basis actually change with state? 2. In Equation (3), there are three hyperparameters. How do these parameters influence the final results? 3.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
