Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling
Zhiqing Sun, Zhi-Hong Deng

TL;DR
This paper introduces a novel neural segmental language model for unsupervised Chinese word segmentation, explicitly modeling segments and achieving competitive results with state-of-the-art statistical methods.
Contribution
It is the first to propose a neural model for unsupervised Chinese word segmentation that explicitly captures segmental structure.
Findings
Achieves competitive performance on four datasets from SIGHAN 2005.
First neural approach to unsupervised Chinese word segmentation.
Demonstrates the effectiveness of segmental language modeling.
Abstract
Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter focuses on finding the optimal segmentation of the highest generative probability. However, while there exists a trivial way to extend the discriminative models into neural version by using neural language models, those of generative ones are non-trivial. In this paper, we propose the segmental language models (SLMs) for CWS. Our approach explicitly focuses on the segmental nature of Chinese, as well as preserves several properties of language models. In SLMs, a context encoder encodes the previous context and a segment decoder generates each segment incrementally. As far as we know, we are the first to propose a neural model for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
