Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese
Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

TL;DR
This paper introduces a neural model that uses a simplified constituency parser to improve Vietnamese and Chinese word segmentation and part-of-speech tagging by incorporating syntactic information, outperforming previous methods.
Contribution
The paper presents a novel joint model employing a simplified constituency parser with a single phrase label to enhance segmentation and POS tagging for Vietnamese and Chinese.
Findings
Higher performance than previous methods on Vietnamese datasets
Effective use of syntactic information improves tagging accuracy
Model generalizes well across multiple Chinese datasets
Abstract
Word segmentation and part-of-speech tagging are two critical preliminary steps for downstream tasks in Vietnamese natural language processing. In reality, people tend to consider also the phrase boundary when performing word segmentation and part of speech tagging rather than solely process word by word from left to right. In this paper, we implement this idea to improve word segmentation and part of speech tagging the Vietnamese language by employing a simplified constituency parser. Our neural model for joint word segmentation and part-of-speech tagging has the architecture of the syllable-based CRF constituency parser. To reduce the complexity of parsing, we replace all constituent labels with a single label indicating for phrases. This model can be augmented with predicted word boundary and part-of-speech tags by other tools. Because Vietnamese and Chinese have some similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsConditional Random Field
