An Empirical Study for Vietnamese Constituency Parsing with Pre-training

Tuan-Vi Tran; Xuan-Thien Pham; Duc-Vu Nguyen; Kiet Van Nguyen; Ngan; Luu-Thuy Nguyen

arXiv:2010.09623·cs.CL·October 21, 2020

An Empirical Study for Vietnamese Constituency Parsing with Pre-training

Tuan-Vi Tran, Xuan-Thien Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan, Luu-Thuy Nguyen

PDF

Open Access

TL;DR

This paper evaluates span-based Vietnamese constituency parsing using self-attention and CKY decoding, comparing pre-training models XLM-Roberta and PhoBERT, and finds XLM-Roberta achieves superior F1-scores on two datasets.

Contribution

It presents an empirical analysis of Vietnamese constituency parsing with pre-trained models, highlighting the effectiveness of XLM-Roberta over PhoBERT.

Findings

01

XLM-Roberta outperforms PhoBERT in F1-score on both datasets.

02

VietTreebank achieved 81.19% F1-score.

03

NIIVTB1 achieved 85.70% F1-score.

Abstract

In this work, we use a span-based approach for Vietnamese constituency parsing. Our method follows the self-attention encoder architecture and a chart decoder using a CKY-style inference algorithm. We present analyses of the experiment results of the comparison of our empirical method using pre-training models XLM-Roberta and PhoBERT on both Vietnamese datasets VietTreebank and NIIVTB1. The results show that our model with XLM-Roberta archived the significantly F1-score better than other pre-training models, VietTreebank at 81.19% and NIIVTB1 at 85.70%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies