TL;DR
This paper introduces a hierarchical Transformer model for Vietnamese spelling correction that leverages character and word levels, along with a new real-life dataset, achieving superior performance over existing methods.
Contribution
The paper presents a novel hierarchical Transformer architecture and a realistic Vietnamese spelling correction dataset, advancing the state-of-the-art in accuracy and practical applicability.
Findings
Outperforms existing methods in recall, precision, and F1-score
Introduces a new real-life Vietnamese spelling correction dataset
Demonstrates effectiveness of hierarchical Transformer approach
Abstract
In this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In addition, to facilitate future work in Vietnamese spelling correction tasks, we propose a realistic dataset collected from real-life texts for the problem. We compare our method with other methods and publicly available systems. The proposed method outperforms all of the contemporary methods in terms of recall, precision, and f1-score. A demo version is publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Residual Connection · Layer Normalization · Byte Pair Encoding
