# State-of-the-Art Vietnamese Word Segmentation

**Authors:** Song Nguyen Duc Cong, Quoc Hung Ngo, Rachsuda Jiamthapthaksin

arXiv: 1906.07662 · 2019-06-19

## TL;DR

This paper reviews current Vietnamese word segmentation methods, discussing corpus creation, machine learning approaches, existing tools, and highlighting achievements and limitations in the field.

## Contribution

It provides a comprehensive overview of Vietnamese word segmentation techniques, including corpus development and machine learning applications, with insights into current challenges.

## Key findings

- Machine learning techniques improve segmentation accuracy.
- Existing tools have notable limitations.
- Building quality corpora is crucial for progress.

## Abstract

Word segmentation is the first step of any tasks in Vietnamese language processing. This paper reviews stateof-the-art approaches and systems for word segmentation in Vietnamese. To have an overview of all stages from building corpora to developing toolkits, we discuss building the corpus stage, approaches applied to solve the word segmentation and existing toolkits to segment words in Vietnamese sentences. In addition, this study shows clearly the motivations on building corpus and implementing machine learning techniques to improve the accuracy for Vietnamese word segmentation. According to our observation, this study also reports a few of achivements and limitations in existing Vietnamese word segmentation systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.07662/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.07662/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1906.07662/full.md

---
Source: https://tomesphere.com/paper/1906.07662