Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts
Xipeng Qiu, Peng Qian, Liusong Yin, Shiyu Wu, Xuanjing Huang

TL;DR
This paper provides an overview of the NLPCC 2015 shared task on Chinese word segmentation and POS tagging specifically for informal micro-blog texts, highlighting datasets, approaches, and results.
Contribution
It introduces a new dataset for micro-blog Chinese text and compares various approaches across different resource tracks in a shared task setting.
Findings
Different approaches show varying effectiveness on informal texts
Resource availability impacts system performance
The shared task fosters progress in Chinese micro-blog NLP
Abstract
In this paper, we give an overview for the shared task at the 4th CCF Conference on Natural Language Processing \& Chinese Computing (NLPCC 2015): Chinese word segmentation and part-of-speech (POS) tagging for micro-blog texts. Different with the popular used newswire datasets, the dataset of this shared task consists of the relatively informal micro-texts. The shared task has two sub-tasks: (1) individual Chinese word segmentation and (2) joint Chinese word segmentation and POS Tagging. Each subtask has three tracks to distinguish the systems with different resources. We first introduce the dataset and task, then we characterize the different approaches of the participating systems, report the test results, and provide a overview analysis of these results. An online system is available for open registration and evaluation at http://nlp.fudan.edu.cn/nlpcc2015.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Web Data Mining and Analysis
