A Feature-Rich Vietnamese Named-Entity Recognition Model

Pham Quang Nhat Minh

arXiv:1803.04375·cs.CL·March 13, 2018·5 cites

A Feature-Rich Vietnamese Named-Entity Recognition Model

Pham Quang Nhat Minh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a feature-rich Vietnamese NER model that combines various linguistic features within a CRF framework, achieving state-of-the-art accuracy and systematically evaluating the impact of NLP tools on NER performance.

Contribution

The work is the first systematic extrinsic evaluation of Vietnamese NLP toolkits on NER, demonstrating the effectiveness of certain features and the limited benefit of PoS and chunking info.

Findings

01

Word segmentation improves NER accuracy.

02

PoS and chunking features show limited benefits.

03

The model achieves state-of-the-art accuracy for Vietnamese NER.

Abstract

In this paper, we present a feature-based named-entity recognition (NER) model that achieves the start-of-the-art accuracy for Vietnamese language. We combine word, word-shape features, PoS, chunk, Brown-cluster-based features, and word-embedding-based features in the Conditional Random Fields (CRF) model. We also explore the effects of word segmentation, PoS tagging, and chunking results of many popular Vietnamese NLP toolkits on the accuracy of the proposed feature-based NER model. Up to now, our work is the first work that systematically performs an extrinsic evaluation of basic Vietnamese NLP toolkits on the downstream NER task. Experimental results show that while automatically-generated word segmentation is useful, PoS and chunking information generated by Vietnamese NLP tools does not show their benefits for the proposed feature-based NER model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minhpqn/vietner
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies