ModuleFormer: Modularity Emerges from Mixture-of-Experts
Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen,, Chuang Gan

TL;DR
ModuleFormer introduces a modular neural network architecture based on Sparse Mixture of Experts that learns from uncurated data, improving efficiency, extendability, and specialization of large language models.
Contribution
It presents a novel modular architecture that induces modularity without domain-labeled data, enabling more efficient, extendable, and specialized large language models.
Findings
Achieves over twice the throughput of dense LLMs with similar performance.
More resistant to catastrophic forgetting and easily extendable with new modules.
Finetuning allows modules to specialize, enabling lightweight deployment.
Abstract
Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models. ModuleFormer is based on the Sparse Mixture of Experts (SMoE). Unlike the previous SMoE-based modular language model, which requires domain-labeled data to learn domain-specific experts, ModuleFormer can induce modularity from uncurated data with its new load balancing and concentration losses. ModuleFormer is a modular architecture that includes two different types of modules: new stick-breaking attention heads and feedforward experts. Different modules are sparsely activated conditions on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
