AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for   Efficient Neural Machine Translation

Ganesh Jawahar; Subhabrata Mukherjee; Xiaodong Liu; Young Jin Kim,; Muhammad Abdul-Mageed; Laks V. S. Lakshmanan; Ahmed Hassan Awadallah,; Sebastien Bubeck; Jianfeng Gao

arXiv:2210.07535·cs.CL·June 9, 2023

AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation

Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim,, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah,, Sebastien Bubeck, Jianfeng Gao

PDF

Open Access 1 Repo

TL;DR

AutoMoE introduces a neural architecture search framework for designing heterogeneous mixture-of-experts models in neural machine translation, optimizing for computational efficiency and adaptivity while maintaining high translation quality.

Contribution

It proposes AutoMoE, a NAS-based method for creating heterogeneous MoE models that balance computational constraints and translation performance.

Findings

01

4x inference speedup on CPU

02

FLOPs reduction compared to manual designs

03

Maintains BLEU scores comparable to state-of-the-art models

Abstract

Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a homogeneous design where the same number of experts of the same size are placed uniformly throughout the network. Furthermore, existing MoE works do not consider computational constraints (e.g., FLOPs, latency) to guide their design. To this end, we develop AutoMoE -- a framework for designing heterogeneous MoE's under computational constraints. AutoMoE leverages Neural Architecture Search (NAS) to obtain efficient sparse MoE sub-transformers with 4x inference speedup (CPU) and FLOPs reduction over manually designed Transformers, with parity in BLEU score over dense Transformer and within 1 BLEU point of MoE SwitchTransformer, on aggregate over benchmark datasets for NMT. Heterogeneous search space with dense and sparsely activated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/automoe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Residual Connection · Dropout · Position-Wise Feed-Forward Layer