On the Vietnamese Name Entity Recognition: A Deep Learning Method   Approach

Ngoc C. L\^e; Ngoc-Yen Nguyen; and Anh-Duong Trinh

arXiv:1912.01109·cs.CL·December 4, 2019

On the Vietnamese Name Entity Recognition: A Deep Learning Method Approach

Ngoc C. L\^e, Ngoc-Yen Nguyen, and Anh-Duong Trinh

PDF

TL;DR

This paper introduces a deep learning approach combining Bi-LSTM and CRF for Vietnamese NER, utilizing word embeddings and semantic features to improve accuracy on the VLSP2016 dataset.

Contribution

It presents a novel deep learning model that integrates word embeddings, semantic, and syntactic features for Vietnamese NER, achieving state-of-the-art results.

Findings

01

Achieved the best results on VLSP2016 dataset

02

Enhanced NER accuracy with combined semantic and syntactic features

03

Demonstrated effectiveness of Bi-LSTM-CRF architecture for Vietnamese NER

Abstract

Named entity recognition (NER) plays an important role in text-based information retrieval. In this paper, we combine Bidirectional Long Short-Term Memory (Bi-LSTM) \cite{hochreiter1997,schuster1997} with Conditional Random Field (CRF) \cite{lafferty2001} to create a novel deep learning model for the NER problem. Each word as input of the deep learning model is represented by a Word2vec-trained vector. A word embedding set trained from about one million articles in 2018 collected through a Vietnamese news portal (baomoi.com). In addition, we concatenate a Word2Vec\cite{mikolov2013}-trained vector with semantic feature vector (Part-Of-Speech (POS) tagging, chunk-tag) and hidden syntactic feature vector (extracted by Bi-LSTM nerwork) to achieve the (so far best) result in Vietnamese NER system. The result was conducted on the data set VLSP2016 (Vietnamese Language and Speech Processing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.