MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers

Kangjun Guo; Haichao Liu; Yanji Sun; Ruhan Zhao; Jinni Zhou; Jun Ma

arXiv:2603.15265·cs.RO·March 17, 2026

MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers

Kangjun Guo, Haichao Liu, Yanji Sun, Ruhan Zhao, Jinni Zhou, Jun Ma

PDF

Open Access

TL;DR

MoE-ACT introduces a sparse Mixture-of-Experts Transformer framework for multi-task bimanual manipulation, significantly improving robustness and success rates in complex robotic tasks through adaptive, language-conditioned action generation.

Contribution

This work presents a novel multi-task imitation learning framework integrating sparse MoE modules and language conditioning into a Transformer for robotic manipulation.

Findings

01

Outperforms vanilla ACT by 33% success rate on average.

02

Enhances robustness and generalization in multi-task environments.

03

Effective in both simulation and real-world dual-arm setups.

Abstract

The ability of robots to handle multiple tasks under a unified policy is critical for deploying embodied intelligence in real-world household and industrial applications. However, out-of-distribution variation across tasks often causes severe task interference and negative transfer when training general robotic policies. To address this challenge, we propose a lightweight multi-task imitation learning framework for bimanual manipulation, termed Mixture-of-Experts-Enhanced Action Chunking Transformer (MoE-ACT), which integrates sparse Mixture-of-Experts (MoE) modules into the Transformer encoder of ACT. The MoE layer decomposes a unified task policy into independently invoked expert components. Through adaptive activation, it naturally decouples multi-task action distributions in latent space. During decoding, Feature-wise Linear Modulation (FiLM) dynamically modulates action tokens to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics