An Empirical Study for Vietnamese Constituency Parsing with Pre-training
Tuan-Vi Tran, Xuan-Thien Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan, Luu-Thuy Nguyen

TL;DR
This paper evaluates span-based Vietnamese constituency parsing using self-attention and CKY decoding, comparing pre-training models XLM-Roberta and PhoBERT, and finds XLM-Roberta achieves superior F1-scores on two datasets.
Contribution
It presents an empirical analysis of Vietnamese constituency parsing with pre-trained models, highlighting the effectiveness of XLM-Roberta over PhoBERT.
Findings
XLM-Roberta outperforms PhoBERT in F1-score on both datasets.
VietTreebank achieved 81.19% F1-score.
NIIVTB1 achieved 85.70% F1-score.
Abstract
In this work, we use a span-based approach for Vietnamese constituency parsing. Our method follows the self-attention encoder architecture and a chart decoder using a CKY-style inference algorithm. We present analyses of the experiment results of the comparison of our empirical method using pre-training models XLM-Roberta and PhoBERT on both Vietnamese datasets VietTreebank and NIIVTB1. The results show that our model with XLM-Roberta archived the significantly F1-score better than other pre-training models, VietTreebank at 81.19% and NIIVTB1 at 85.70%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
