A Fast and Accurate Vietnamese Word Segmenter
Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson

TL;DR
This paper introduces a novel Vietnamese word segmentation method based on Ripple Down Rules, achieving higher accuracy and speed than previous state-of-the-art tools, with open-source implementation.
Contribution
The paper presents a new Vietnamese word segmentation approach using Ripple Down Rules, improving both accuracy and efficiency over existing methods.
Findings
Outperforms JVnSegmenter, vnTokenizer, DongDu, and UETsegmenter in accuracy
Faster segmentation speed than previous methods
Open-source code available for public use
Abstract
We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
