Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for Low-resourced Languages
Yingwen Fu, Jinyi Chen, Nankai Lin, Xixuan Huang, Xinying, Qiu, Shengyi Jiang

TL;DR
The Yunshan Cup 2020 evaluated POS tagging methods for Indonesian and Lao, showing neural models outperform traditional techniques with high accuracy on low-resource language datasets.
Contribution
This paper provides an overview of a competitive evaluation of POS tagging methods for low-resource languages, highlighting the effectiveness of neural sequence models.
Findings
Neural models achieved over 95% accuracy for Indonesian.
Traditional methods lagged behind neural approaches.
Ensemble neural methods performed best.
Abstract
The Yunshan Cup 2020 track focused on creating a framework for evaluating different methods of part-of-speech (POS). There were two tasks for this track: (1) POS tagging for the Indonesian language, and (2) POS tagging for the Lao tagging. The Indonesian dataset is comprised of 10000 sentences from Indonesian news within 29 tags. And the Lao dataset consists of 8000 sentences within 27 tags. 25 teams registered for the task. The methods of participants ranged from feature-based to neural networks using either classical machine learning techniques or ensemble methods. The best performing results achieve an accuracy of 95.82% for Indonesian and 93.03%, showing that neural sequence labeling models significantly outperform classic feature-based methods and rule-based methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
