Sub-Word Alignment Is Still Useful: A Vest-Pocket Method for Enhancing   Low-Resource Machine Translation

Minhan Xu; Yu Hong

arXiv:2205.04067·cs.CL·May 10, 2022

Sub-Word Alignment Is Still Useful: A Vest-Pocket Method for Enhancing Low-Resource Machine Translation

Minhan Xu, Yu Hong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple yet effective sub-word alignment method that enhances low-resource machine translation by leveraging embedding duplication, resulting in significant BLEU score improvements and reduced training time.

Contribution

It extends parent-child transfer learning with a novel sub-word alignment technique, achieving better performance and efficiency in low-resource translation tasks.

Findings

01

Achieved BLEU scores of 22.5, 28.0, and 18.1 on My-En, Id-En, and Tr-En.

02

Reduced training time by 63.8%, completing in 1.6 hours on a Tesla P100 GPU.

03

Method is computationally efficient and publicly available.

Abstract

We leverage embedding duplication between aligned sub-words to extend the Parent-Child transfer learning method, so as to improve low-resource machine translation. We conduct experiments on benchmark datasets of My-En, Id-En and Tr-En translation scenarios. The test results show that our method produces substantial improvements, achieving the BLEU scores of 22.5, 28.0 and 18.1 respectively. In addition, the method is computationally efficient which reduces the consumption of training time by 63.8%, reaching the duration of 1.6 hours when training on a Tesla 16GB P100 GPU. All the models and source codes in the experiments will be made publicly available to support reproducible research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cosmos-Break/transfer-mt-submit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis