Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text
Shengbin Jia, Ling Ding, Xiaojun Chen, Shijia E, Yang Xiang

TL;DR
This paper introduces UIcwsNN, a model that leverages uncertain segmentation information to improve Chinese NER on social media text, effectively reducing error propagation and enhancing performance.
Contribution
The paper proposes a novel approach that encodes uncertain segmentation states using a trilogy of techniques, improving Chinese NER accuracy on social media data.
Findings
Achieves over 2% performance improvement over previous methods.
Effectively alleviates segmentation error cascading in social media Chinese NER.
Demonstrates the effectiveness of encoding uncertain segmentation information.
Abstract
Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging ambiguous information of word segmentation. Such uncertain information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., candidate position embedding -> position selective attention -> adaptive word convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experiments results on the social media corpus show that our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
