U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo, Dang, Fuping Pan, Chao Yang

TL;DR
This paper demonstrates that replacing all feed-forward layers with Mixture-of-Experts (MoE) in speech recognition models can scale parameters by 4.7x, achieving large model performance with minimal impact on real-time factor and enabling flexible decoding modes.
Contribution
The study shows that a simple substitution of MoE layers for all FFN layers is effective for ASR, enabling large-scale models with minimal complexity and maintaining deployment efficiency.
Findings
Scaled Conformer from 225M to 1B parameters with comparable WER.
Achieved 4.7x parameter scaling with minimal impact on RTF.
Unified 2-pass framework supports both streaming and non-streaming decoding.
Abstract
Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into ASR models have complex designs such as routing frames via supplementary embedding network, improving multilingual ability for the experts, and utilizing dedicated auxiliary losses for either expert load balancing or specific language handling. We found that delicate designs are not necessary, while an embarrassingly simple substitution of MoE layers for all Feed-Forward Network (FFN) layers is competent for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNuclear Physics and Applications · Radiation Detection and Scintillator Technologies
MethodsMixture of Experts
