Chinese Word Segmentation: Another Decade Review (2007-2017)

Hai Zhao; Deng Cai; Changning Huang; Chunyu Kit

arXiv:1901.06079·cs.CL·January 21, 2019·19 cites

Chinese Word Segmentation: Another Decade Review (2007-2017)

Hai Zhao, Deng Cai, Changning Huang, Chunyu Kit

PDF

Open Access

TL;DR

This review examines the progress of Chinese word segmentation from 2007 to 2017, highlighting the limited performance gains of neural methods over traditional ones and emphasizing ongoing challenges with vocabulary recognition.

Contribution

It provides a comprehensive overview of CWS development over a decade, focusing on deep learning integration and identifying key challenges and future prospects.

Findings

01

Neural network methods have not yet outperformed traditional approaches.

02

Balancing recognition of IV and OOV words remains a critical challenge.

03

Potential for future improvements with neural models is promising.

Abstract

This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superior performance. The most critical challenge still lies on balancing of recognition of in-vocabulary (IV) and out-of-vocabulary (OOV) words. However, as neural models have potentials to capture the essential linguistic structure of natural language, we are optimistic about significant progresses may arrive in the near future.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques