Syntax-based data augmentation for Hungarian-English machine translation

Attila Nagy; Patrick Nanys; Bal\'azs Frey Konr\'ad; Bence Bial; Judit; \'Acs

arXiv:2201.06876·cs.CL·January 19, 2022

Syntax-based data augmentation for Hungarian-English machine translation

Attila Nagy, Patrick Nanys, Bal\'azs Frey Konr\'ad, Bence Bial, Judit, \'Acs

PDF

Open Access 2 Repos

TL;DR

This paper explores syntax-based data augmentation techniques to improve Transformer-based neural machine translation between Hungarian and English, achieving high BLEU scores and providing publicly available code and models.

Contribution

It introduces a syntax-based augmentation method for neural machine translation and reports state-of-the-art results on Hungarian-English translation tasks.

Findings

01

Best models achieved BLEU scores of 40.0 (Hungarian-English) and 33.4 (English-Hungarian)

02

Syntax-based augmentation shows promise for improving translation quality

03

Code and models are publicly available for further research

Abstract

We train Transformer-based neural machine translation models for Hungarian-English and English-Hungarian using the Hunglish2 corpus. Our best models achieve a BLEU score of 40.0 on HungarianEnglish and 33.4 on English-Hungarian. Furthermore, we present results on an ongoing work about syntax-based augmentation for neural machine translation. Both our code and models are publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification