Developing neural machine translation models for Hungarian-English
Attila Nagy

TL;DR
This paper develops neural machine translation models for Hungarian-English using the Hunglish2 corpus, introducing five dependency tree-based data augmentation methods that improve translation quality.
Contribution
It proposes five novel structure-aware data augmentation techniques for NMT, leveraging dependency trees to enhance translation performance.
Findings
Hungarian-English BLEU score of 33.9
English-Hungarian BLEU score of 28.6
Dependency-aware augmentation improves translation quality
Abstract
I train models for the task of neural machine translation for English-Hungarian and Hungarian-English, using the Hunglish2 corpus. The main contribution of this work is evaluating different data augmentation methods during the training of NMT models. I propose 5 different augmentation methods that are structure-aware, meaning that instead of randomly selecting words for blanking or replacement, the dependency tree of sentences is used as a basis for augmentation. I start my thesis with a detailed literature review on neural networks, sequential modeling, neural machine translation, dependency parsing and data augmentation. After a detailed exploratory data analysis and preprocessing of the Hunglish2 corpus, I perform experiments with the proposed data augmentation techniques. The best model for Hungarian-English achieves a BLEU score of 33.9, while the best model for English-Hungarian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
