Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning
Nanyun Peng, Mark Dredze

TL;DR
This paper demonstrates that jointly training neural representations for word segmentation and named entity recognition significantly improves NER performance on Chinese social media data, achieving nearly 5% absolute accuracy gain.
Contribution
It introduces a joint training approach using neural models for word segmentation and NER, enhancing Chinese social media NER accuracy.
Findings
Nearly 5% absolute improvement in NER accuracy.
Neural representations from segmentation aid NER performance.
Joint training outperforms separate models.
Abstract
Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings. For languages where word boundaries are not readily identified in text, word segmentation is a key first step to generating features for an NER system. While using word boundary tags as features are helpful, the signals that aid in identifying these boundaries may provide richer information for an NER system. New state-of-the-art word segmentation systems use neural models to learn representations for predicting word boundaries. We show that these same representations, jointly trained with an NER system, yield significant improvements in NER for Chinese social media. In our experiments, jointly training NER and word segmentation with an LSTM-CRF model yields nearly 5% absolute improvement over previously published results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
