Limpeh ga li gong: Challenges in Singlish Annotations
Luo Qi Chan, Lynnette Hui Xian Ng

TL;DR
This paper explores the challenges of POS tagging in Singlish, a colloquial language from Singapore, highlighting the difficulties in annotation and the limited accuracy of current automatic taggers, thereby paving the way for future research.
Contribution
It introduces a new Singlish dataset with human-annotated POS tags and analyzes the limitations of existing automatic tagging methods on this language.
Findings
Automatic taggers achieve only ~80% accuracy on Singlish.
Singlish's unique features cause annotation and tagging challenges.
The dataset reveals significant variability and complexity in Singlish language use.
Abstract
Singlish, or Colloquial Singapore English, is a language formed from oral and social communication within multicultural Singapore. In this work, we work on a fundamental Natural Language Processing (NLP) task: Parts-Of-Speech (POS) tagging of Singlish sentences. For our analysis, we build a parallel Singlish dataset containing direct English translations and POS tags, with translation and POS annotation done by native Singlish speakers. Our experiments show that automatic transition- and transformer- based taggers perform with only accuracy when evaluated against human-annotated POS labels, suggesting that there is indeed room for improvement on computation analysis of the language. We provide an exposition of challenges in Singlish annotation: its inconsistencies in form and semantics, the highly context-dependent particles of the language, its structural unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications
