Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation
Yushi Yao, Zheng Huang

TL;DR
This paper introduces a bi-directional LSTM neural network for Chinese word segmentation, achieving state-of-the-art results without manual feature engineering by effectively capturing context in both directions.
Contribution
The paper presents a novel bi-directional LSTM approach that eliminates the need for handcrafted features in Chinese word segmentation, improving performance.
Findings
Achieved state-of-the-art segmentation accuracy on Chinese datasets.
Outperformed traditional feature-based methods.
Effective in both traditional and simplified Chinese texts.
Abstract
Recurrent neural network(RNN) has been broadly applied to natural language processing(NLP) problems. This kind of neural network is designed for modeling sequential data and has been testified to be quite efficient in sequential tagging tasks. In this paper, we propose to use bi-directional RNN with long short-term memory(LSTM) units for Chinese word segmentation, which is a crucial preprocess task for modeling Chinese sentences and articles. Classical methods focus on designing and combining hand-craft features from context, whereas bi-directional LSTM network(BLSTM) does not need any prior knowledge or pre-designing, and it is expert in keeping the contextual information in both directions. Experiment result shows that our approach gets state-of-the-art performance in word segmentation on both traditional Chinese datasets and simplified Chinese datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
