MoCE: Adaptive Mixture of Contextualization Experts for Byte-based   Neural Machine Translation

Langlin Huang; Mengyu Bu; Yang Feng

arXiv:2411.01474·cs.CL·February 10, 2025

MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation

Langlin Huang, Mengyu Bu, Yang Feng

PDF

Open Access 1 Repo 1 Video

TL;DR

MoCE introduces an adaptive mixture model for byte-based neural machine translation, improving semantic contextualization and outperforming existing methods in multilingual settings without extensive hyper-parameter tuning.

Contribution

The paper proposes MoCE, a novel adaptive mixture of attention heads as contextualization experts, enhancing byte-level translation by dynamically selecting and combining contextualization strategies.

Findings

01

Outperforms existing byte-based translation methods on Ted-59 dataset.

02

Requires fewer parameters than subword-based models.

03

Effectively adapts to language variations without manual hyper-parameter tuning.

Abstract

Byte-based machine translation systems have shown significant potential in massively multilingual settings. Unicode encoding, which maps each character to specific byte(s), eliminates the emergence of unknown words, even in new languages. This avoids out-of-vocabulary risk in multilingual translation and enables broad language scalability. However, byte-level tokenization results in sequences that are hard to interpret due to limited semantic information per byte. Local contextualization has proven effective in assigning initial semantics to tokens, improving sentence comprehension. Nevertheless, variations in encoding rules across languages necessitate an adaptive approach for effective contextualization. To this end, we propose Mixture of Contextualization Experts (MoCE), adaptively selecting and mixing attention heads, which are treated as contextualization experts. This enhances the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ictnlp/moce
pytorchOfficial

Videos

MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsSoftmax · Attention Is All You Need