Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task
Melissa Ailem, Jinghsu Liu, Raheel Qader

TL;DR
This paper presents Lingua Custodia's approach to improve machine translation of English to French, Russian, and Chinese by using a Transformer model with novel training techniques to better handle terminology constraints.
Contribution
It introduces two new methods—training data augmentation and constraint token masking—to enhance terminology constraint satisfaction in neural machine translation.
Findings
Method effectively enforces terminology constraints.
Maintains high translation quality.
Improves model's ability to copy terminology terms.
Abstract
This paper describes Lingua Custodia's submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization. Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
