THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation

Yunlong Liang; Fandong Meng; Jie Zhou

arXiv:2505.14173·cs.CL·May 21, 2025

THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation

Yunlong Liang, Fandong Meng, Jie Zhou

PDF

Open Access 1 Video

TL;DR

THOR-MoE introduces hierarchical task-guided and context-aware routing in neural machine translation, improving expert selection and translation quality across multiple domains and languages.

Contribution

It proposes a novel hierarchical routing method that incorporates task labels and context, enhancing MoE performance without requiring task knowledge during deployment.

Findings

01

Achieves superior translation performance on multi-domain and multilingual benchmarks.

02

Operates as a plug-and-play module compatible with existing routing schemes.

03

Improves BLEU scores with fewer activated parameters.

Abstract

The sparse Mixture-of-Experts (MoE) has achieved significant progress for neural machine translation (NMT). However, there exist two limitations in current MoE solutions which may lead to sub-optimal performance: 1) they directly use the task knowledge of NMT into MoE (\emph{e.g.}, domain/linguistics-specific knowledge), which are generally unavailable at practical application and neglect the naturally grouped domain/linguistic properties; 2) the expert selection only depends on the localized token representation without considering the context, which fully grasps the state of each token in a global view. To address the above limitations, we propose THOR-MoE via arming the MoE with hierarchical task-guided and context-responsive routing policies. Specifically, it 1) firstly predicts the domain/language label and then extracts mixed domain/language representation to allocate task-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Neural Network Applications · Topic Modeling

MethodsMixture of Experts