A Feature-Rich Vietnamese Named-Entity Recognition Model
Pham Quang Nhat Minh

TL;DR
This paper introduces a feature-rich Vietnamese NER model that combines various linguistic features within a CRF framework, achieving state-of-the-art accuracy and systematically evaluating the impact of NLP tools on NER performance.
Contribution
The work is the first systematic extrinsic evaluation of Vietnamese NLP toolkits on NER, demonstrating the effectiveness of certain features and the limited benefit of PoS and chunking info.
Findings
Word segmentation improves NER accuracy.
PoS and chunking features show limited benefits.
The model achieves state-of-the-art accuracy for Vietnamese NER.
Abstract
In this paper, we present a feature-based named-entity recognition (NER) model that achieves the start-of-the-art accuracy for Vietnamese language. We combine word, word-shape features, PoS, chunk, Brown-cluster-based features, and word-embedding-based features in the Conditional Random Fields (CRF) model. We also explore the effects of word segmentation, PoS tagging, and chunking results of many popular Vietnamese NLP toolkits on the accuracy of the proposed feature-based NER model. Up to now, our work is the first work that systematically performs an extrinsic evaluation of basic Vietnamese NLP toolkits on the downstream NER task. Experimental results show that while automatically-generated word segmentation is useful, PoS and chunking information generated by Vietnamese NLP tools does not show their benefits for the proposed feature-based NER model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
