TL;DR
This paper develops Japanese-Vietnamese neural machine translation systems for low-resource languages by combining advanced techniques, including a novel Byte-Pair Encoding variant, to improve translation quality amidst data scarcity.
Contribution
It introduces the first NMT system for Japanese-Vietnamese and proposes a new Byte-Pair Encoding variant to enhance word segmentation and address rare-word issues.
Findings
Significant translation quality improvements achieved
Effective Vietnamese word segmentation method proposed
First NMT system for Japanese-Vietnamese language pair
Abstract
Neural machine translation (NMT) systems have recently obtained state-of-the art in many machine translation systems between popular language pairs because of the availability of data. For low-resourced language pairs, there are few researches in this field due to the lack of bilingual data. In this paper, we attempt to build the first NMT systems for a low-resourced language pairs:Japanese-Vietnamese. We have also shown significant improvements when combining advanced methods to reduce the adverse impacts of data sparsity and improve the quality of NMT systems. In addition, we proposed a variant of Byte-Pair Encoding algorithm to perform effective word segmentation for Vietnamese texts and alleviate the rare-word problem that persists in NMT systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
