Developing neural machine translation models for Hungarian-English

Attila Nagy

arXiv:2111.04099·cs.CL·November 9, 2021

Developing neural machine translation models for Hungarian-English

Attila Nagy

PDF

Open Access

TL;DR

This paper develops neural machine translation models for Hungarian-English using the Hunglish2 corpus, introducing five dependency tree-based data augmentation methods that improve translation quality.

Contribution

It proposes five novel structure-aware data augmentation techniques for NMT, leveraging dependency trees to enhance translation performance.

Findings

01

Hungarian-English BLEU score of 33.9

02

English-Hungarian BLEU score of 28.6

03

Dependency-aware augmentation improves translation quality

Abstract

I train models for the task of neural machine translation for English-Hungarian and Hungarian-English, using the Hunglish2 corpus. The main contribution of this work is evaluating different data augmentation methods during the training of NMT models. I propose 5 different augmentation methods that are structure-aware, meaning that instead of randomly selecting words for blanking or replacement, the dependency tree of sentences is used as a basis for augmentation. I start my thesis with a detailed literature review on neural networks, sequential modeling, neural machine translation, dependency parsing and data augmentation. After a detailed exploratory data analysis and preprocessing of the Hunglish2 corpus, I perform experiments with the proposed data augmentation techniques. The best model for Hungarian-English achieves a BLEU score of 33.9, while the best model for English-Hungarian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification