Vocabulary Learning via Optimal Transport for Neural Machine Translation
Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li

TL;DR
This paper introduces VOLT, an optimal transport-based method for selecting vocabularies in neural machine translation, leading to improved translation quality and efficiency without trial training.
Contribution
VOLT formulates vocabulary selection as an optimal transport problem and provides an efficient, trial-free solution that outperforms existing methods in translation tasks.
Findings
VOLT achieves nearly 70% vocabulary size reduction.
VOLT improves BLEU scores by 0.5 on English-German translation.
VOLT significantly reduces search time compared to BPE-search.
Abstract
The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether one can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of the role of vocabulary from the perspective of information theory. Motivated by this, we formulate the quest of vocabularization -- finding the best token dictionary with a proper size -- as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT outperforms widely-used vocabularies in diverse scenarios, including WMT-14 English-German and TED's 52 translation directions. For example, VOLT achieves almost 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
