Task-Based MoE for Multitask Multilingual Machine Translation

Hai Pham; Young Jin Kim; Subhabrata Mukherjee; David P. Woodruff,; Barnabas Poczos; Hany Hassan Awadalla

arXiv:2308.15772·cs.CL·October 26, 2023

Task-Based MoE for Multitask Multilingual Machine Translation

Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff,, Barnabas Poczos, Hany Hassan Awadalla

PDF

Open Access

TL;DR

This paper introduces a task-aware MoE architecture with shared dynamic adapters for multitask multilingual machine translation, improving performance and generalization over traditional task-agnostic MoE models.

Contribution

It proposes a novel task-informed MoE design with shared adapters, enhancing multitask translation and enabling efficient adaptation to new tasks.

Findings

01

Outperforms dense and canonical MoE models in multilingual translation

02

Improves task generalization and adaptation efficiency

03

Demonstrates advantages of task-specific adapters in MoE models

Abstract

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning