Bridging the Gap: Transfer Learning from English PLMs to Malaysian English
Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

TL;DR
This paper introduces MENmBERT and MENBERT, specialized pre-trained language models for Malaysian English, demonstrating improved performance in NER and RE tasks by leveraging language-specific data, despite challenges in overall NER accuracy.
Contribution
The paper presents novel Malaysian English-specific PLMs, MENmBERT and MENBERT, fine-tuned on annotated data, enhancing NER and RE performance in low-resource Malaysian English contexts.
Findings
MENmBERT improved NER by 1.52% and RE by 26.27% over baseline models.
Significant performance gains observed for 12 specific entity labels.
Pre-training on local language data benefits low-resource NLP tasks.
Abstract
Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps, we introduce MENmBERT and MENBERT, a pre-trained language model with contextual understanding, specifically tailored for Malaysian English. We have fine-tuned MENmBERT and MENBERT using manually annotated entities and relations from the Malaysian English News Article (MEN) Dataset. This fine-tuning process allows the PLM to learn representations that capture the nuances of Malaysian English relevant for NER and RE tasks. MENmBERT achieved a 1.52\% and 26.27\% improvement on NER and RE tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecond Language Learning and Teaching · Second Language Acquisition and Learning
