Syntax-based data augmentation for Hungarian-English machine translation
Attila Nagy, Patrick Nanys, Bal\'azs Frey Konr\'ad, Bence Bial, Judit, \'Acs

TL;DR
This paper explores syntax-based data augmentation techniques to improve Transformer-based neural machine translation between Hungarian and English, achieving high BLEU scores and providing publicly available code and models.
Contribution
It introduces a syntax-based augmentation method for neural machine translation and reports state-of-the-art results on Hungarian-English translation tasks.
Findings
Best models achieved BLEU scores of 40.0 (Hungarian-English) and 33.4 (English-Hungarian)
Syntax-based augmentation shows promise for improving translation quality
Code and models are publicly available for further research
Abstract
We train Transformer-based neural machine translation models for Hungarian-English and English-Hungarian using the Hunglish2 corpus. Our best models achieve a BLEU score of 40.0 on HungarianEnglish and 33.4 on English-Hungarian. Furthermore, we present results on an ongoing work about syntax-based augmentation for neural machine translation. Both our code and models are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
