Chinese Lexical Analysis with Deep Bi-GRU-CRF Network
Zhenyu Jiao, Shuqi Sun, Ke Sun

TL;DR
This paper presents a deep Bi-GRU-CRF neural network that jointly performs Chinese word segmentation, POS tagging, and NER, achieving high accuracy and efficiency by leveraging large corpora and fine-tuning techniques.
Contribution
The introduction of a joint deep Bi-GRU-CRF model for Chinese lexical analysis that outperforms previous tools in accuracy and efficiency.
Findings
Achieved 95.5% accuracy on test set.
Reduced error rate by roughly 13% compared to previous best.
Processed 2.3K characters per second with one thread.
Abstract
Lexical analysis is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end lexical analysis models with recurrent neural networks have gained increasing attention. In this report, we introduce a deep Bi-GRU-CRF network that jointly models word segmentation, part-of-speech tagging and named entity recognition tasks. We trained the model using several massive corpus pre-tagged by our best Chinese lexical analysis tool, together with a small, yet high-quality human annotated corpus. We conducted balanced sampling between different corpora to guarantee the influence of human annotations, and fine-tune the CRF decoding layer regularly during the training progress. As evaluated by linguistic experts, the model achieved a 95.5% accuracy on the test set, roughly 13% relative error reduction over our (previously) best Chinese…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Conditional Random Field
