Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF
Yan Shao, Christian Hardmeier, J\"org Tiedemann, Joakim Nivre

TL;DR
This paper introduces a character-based bidirectional RNN-CRF model for joint Chinese segmentation and POS tagging, achieving state-of-the-art results across multiple datasets with rich contextual character representations.
Contribution
The paper adapts the bidirectional RNN-CRF architecture with novel character vector representations for Chinese, improving joint segmentation and POS tagging accuracy.
Findings
Achieves 94.38 F1-score on CTB5 dataset.
Model is robust across different datasets and genres.
Outperforms previous state-of-the-art methods.
Abstract
We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
