Convolutional Neural Network with Word Embeddings for Chinese Word   Segmentation

Chunqi Wang; Bo Xu

arXiv:1711.04411·cs.CL·November 15, 2017·6 cites

Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

Chunqi Wang, Bo Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convolutional neural network model with word embeddings for Chinese word segmentation, automatically capturing n-gram features and achieving state-of-the-art results without external resources.

Contribution

It proposes a CNN-based model that automatically captures n-gram features and effectively integrates word embeddings for improved Chinese word segmentation.

Findings

01

Achieves 95.7% on PKU and 97.3% on MSR without feature engineering.

02

With word embeddings, reaches 96.5% on PKU and 98.0% on MSR.

03

Outperforms previous models on benchmark datasets.

Abstract

Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The second is that they make no use of full word information. For the first weakness, we propose a convolutional neural model, which is able to capture rich n-gram features without any feature engineering. For the second one, we propose an effective approach to integrate the proposed model with word embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. Without any feature engineering, the model obtains competitive performance -- 95.7% on PKU and 97.3% on MSR. Armed with word embeddings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chqiwang/convseg
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling