On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration
Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong

TL;DR
This study compares phrase-based and neural machine translation methods for Vietnamese diacritic restoration, showing neural methods are faster but slightly less accurate, with potential for future improvements.
Contribution
It is the first to apply neural machine translation to Vietnamese diacritic restoration and provides a comprehensive comparison with the existing phrase-based approach.
Findings
Phrase-based approach achieves 97.32% accuracy.
Neural-based approach achieves 96.15% accuracy.
Neural method is approximately twice as fast in inference.
Abstract
This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-the-art method for this problem. On a large dataset, the phrase-based approach has an accuracy of 97.32% while that of the neural-based approach is 96.15%. While the neural-based method has a slightly lower accuracy, it is about twice faster than the phrase-based method in terms of inference speed. Moreover, neural-based machine translation method has much room for future improvement such as incorporating pre-trained word embeddings and collecting more training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
