MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers
Kangjun Guo, Haichao Liu, Yanji Sun, Ruhan Zhao, Jinni Zhou, Jun Ma

TL;DR
MoE-ACT introduces a sparse Mixture-of-Experts Transformer framework for multi-task bimanual manipulation, significantly improving robustness and success rates in complex robotic tasks through adaptive, language-conditioned action generation.
Contribution
This work presents a novel multi-task imitation learning framework integrating sparse MoE modules and language conditioning into a Transformer for robotic manipulation.
Findings
Outperforms vanilla ACT by 33% success rate on average.
Enhances robustness and generalization in multi-task environments.
Effective in both simulation and real-world dual-arm setups.
Abstract
The ability of robots to handle multiple tasks under a unified policy is critical for deploying embodied intelligence in real-world household and industrial applications. However, out-of-distribution variation across tasks often causes severe task interference and negative transfer when training general robotic policies. To address this challenge, we propose a lightweight multi-task imitation learning framework for bimanual manipulation, termed Mixture-of-Experts-Enhanced Action Chunking Transformer (MoE-ACT), which integrates sparse Mixture-of-Experts (MoE) modules into the Transformer encoder of ACT. The MoE layer decomposes a unified task policy into independently invoked expert components. Through adaptive activation, it naturally decouples multi-task action distributions in latent space. During decoding, Feature-wise Linear Modulation (FiLM) dynamically modulates action tokens to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics
