Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for   Low-resourced Languages

Yingwen Fu; Jinyi Chen; Nankai Lin; Xixuan Huang; Xinying; Qiu; Shengyi Jiang

arXiv:2204.02658·cs.CL·April 7, 2022·1 cites

Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for Low-resourced Languages

Yingwen Fu, Jinyi Chen, Nankai Lin, Xixuan Huang, Xinying, Qiu, Shengyi Jiang

PDF

Open Access 1 Datasets

TL;DR

The Yunshan Cup 2020 evaluated POS tagging methods for Indonesian and Lao, showing neural models outperform traditional techniques with high accuracy on low-resource language datasets.

Contribution

This paper provides an overview of a competitive evaluation of POS tagging methods for low-resource languages, highlighting the effectiveness of neural sequence models.

Findings

01

Neural models achieved over 95% accuracy for Indonesian.

02

Traditional methods lagged behind neural approaches.

03

Ensemble neural methods performed best.

Abstract

The Yunshan Cup 2020 track focused on creating a framework for evaluating different methods of part-of-speech (POS). There were two tasks for this track: (1) POS tagging for the Indonesian language, and (2) POS tagging for the Lao tagging. The Indonesian dataset is comprised of 10000 sentences from Indonesian news within 29 tags. And the Lao dataset consists of 8000 sentences within 27 tags. 25 teams registered for the task. The methods of participants ranged from feature-based to neural networks using either classical machine learning techniques or ensemble methods. The best performing results achieve an accuracy of 95.82% for Indonesian and 93.03%, showing that neural sequence labeling models significantly outperform classic feature-based methods and rule-based methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

SEACrowd/yunshan_cup_2020
dataset· 18 dl
18 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling