Combining Advanced Methods in Japanese-Vietnamese Neural Machine   Translation

Thi-Vinh Ngo; Thanh-Le Ha; Phuong-Thai Nguyen; Le-Minh Nguyen

arXiv:1805.07133·cs.CL·December 17, 2020

Combining Advanced Methods in Japanese-Vietnamese Neural Machine Translation

Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen, Le-Minh Nguyen

PDF

1 Repo

TL;DR

This paper develops Japanese-Vietnamese neural machine translation systems for low-resource languages by combining advanced techniques, including a novel Byte-Pair Encoding variant, to improve translation quality amidst data scarcity.

Contribution

It introduces the first NMT system for Japanese-Vietnamese and proposes a new Byte-Pair Encoding variant to enhance word segmentation and address rare-word issues.

Findings

01

Significant translation quality improvements achieved

02

Effective Vietnamese word segmentation method proposed

03

First NMT system for Japanese-Vietnamese language pair

Abstract

Neural machine translation (NMT) systems have recently obtained state-of-the art in many machine translation systems between popular language pairs because of the availability of data. For low-resourced language pairs, there are few researches in this field due to the lack of bilingual data. In this paper, we attempt to build the first NMT systems for a low-resourced language pairs:Japanese-Vietnamese. We have also shown significant improvements when combining advanced methods to reduce the adverse impacts of data sparsity and improve the quality of NMT systems. In addition, we proposed a variant of Byte-Pair Encoding algorithm to perform effective word segmentation for Vietnamese texts and alleviate the rare-word problem that persists in NMT systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ngovinhtn/JaViCorpus
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.