Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix   Capture

Duc-Vu Nguyen; Dang Van Thin; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen

arXiv:2006.07804·cs.CL·June 16, 2020

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

Duc-Vu Nguyen, Dang Van Thin, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

PDF

1 Repo

TL;DR

This paper presents a Vietnamese word segmentation method using SVM that reduces ambiguity and captures suffixes, achieving better accuracy than existing state-of-the-art methods without relying on longest matching algorithms.

Contribution

The paper introduces novel feature extraction techniques for Vietnamese word segmentation with SVM, improving accuracy and handling unknown words more effectively.

Findings

01

Achieved higher F1-score than UETsegmenter and RDRsegmenter

02

Proposed features reduce ambiguity and improve suffix prediction

03

Method does not require longest matching or post-processing

Abstract

In this paper, we approach Vietnamese word segmentation as a binary classification by using the Support Vector Machine classifier. We inherit features from prior works such as n-gram of syllables, n-gram of syllable types, and checking conjunction of adjacent syllables in the dictionary. We propose two novel ways to feature extraction, one to reduce the overlap ambiguity and the other to increase the ability to predict unknown words containing suffixes. Different from UETsegmenter and RDRsegmenter, two state-of-the-art Vietnamese word segmentation methods, we do not employ the longest matching algorithm as an initial processing step or any post-processing technique. According to experimental results on benchmark Vietnamese datasets, our proposed method obtained a better F1-score than the prior state-of-the-art methods UETsegmenter, and RDRsegmenter.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ngannlt/UITws-v1
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.