THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation
Yunlong Liang, Fandong Meng, Jie Zhou

TL;DR
THOR-MoE introduces hierarchical task-guided and context-aware routing in neural machine translation, improving expert selection and translation quality across multiple domains and languages.
Contribution
It proposes a novel hierarchical routing method that incorporates task labels and context, enhancing MoE performance without requiring task knowledge during deployment.
Findings
Achieves superior translation performance on multi-domain and multilingual benchmarks.
Operates as a plug-and-play module compatible with existing routing schemes.
Improves BLEU scores with fewer activated parameters.
Abstract
The sparse Mixture-of-Experts (MoE) has achieved significant progress for neural machine translation (NMT). However, there exist two limitations in current MoE solutions which may lead to sub-optimal performance: 1) they directly use the task knowledge of NMT into MoE (\emph{e.g.}, domain/linguistics-specific knowledge), which are generally unavailable at practical application and neglect the naturally grouped domain/linguistic properties; 2) the expert selection only depends on the localized token representation without considering the context, which fully grasps the state of each token in a global view. To address the above limitations, we propose THOR-MoE via arming the MoE with hierarchical task-guided and context-responsive routing policies. Specifically, it 1) firstly predicts the domain/language label and then extracts mixed domain/language representation to allocate task-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Neural Network Applications · Topic Modeling
MethodsMixture of Experts
