Neural Word Segmentation Learning for Chinese
Deng Cai, Hai Zhao

TL;DR
This paper introduces a neural framework for Chinese word segmentation that eliminates the need for fixed context windows and leverages complete segmentation history, achieving competitive results without feature engineering.
Contribution
The proposed model uses a gated combination neural network and LSTM to improve Chinese word segmentation by capturing full context and history, surpassing previous methods.
Findings
Achieves state-of-the-art performance on benchmark datasets.
Eliminates reliance on feature engineering.
Utilizes complete segmentation history for better accuracy.
Abstract
Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured. In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history. Our model employs a gated combination neural network over characters to produce distributed representations of word candidates, which are then given to a long short-term memory (LSTM) language scoring model. Experiments on the benchmark datasets show that without the help of feature engineering as most existing approaches, our models achieve competitive or better performances with previous state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
