Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models
Bin Wang, Xunlong Zou, Shuo Sun, Wenyu Zhang, Yingxu He, Zhuohan Liu,, Chengwei Wei, Nancy F. Chen, AiTi Aw

TL;DR
This paper introduces a standardized spoken Singlish corpus and a multimodal model, SingAudioLLM, enabling advanced multilingual speech tasks and achieving state-of-the-art results in Singlish understanding.
Contribution
The work provides the largest annotated spoken Singlish dataset and a novel multimodal model, advancing research in low-resource, multilingual speech processing.
Findings
Achieved 10-30% performance improvements over previous models.
Created the largest standardized spoken Singlish corpus with annotations.
Demonstrated the effectiveness of multimodal models in Singlish tasks.
Abstract
Singlish, a Creole language rooted in English, is a key focus in linguistic research within multilingual and multicultural contexts. However, its spoken form remains underexplored, limiting insights into its linguistic structure and applications. To address this gap, we standardize and annotate the largest spoken Singlish corpus, introducing the Multitask National Speech Corpus (MNSC). These datasets support diverse tasks, including Automatic Speech Recognition (ASR), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), and Paralinguistic Question Answering (PQA). We release standardized splits and a human-verified test set to facilitate further research. Additionally, we propose SingAudioLLM, a multi-task multimodal model leveraging multimodal large language models to handle these tasks concurrently. Experiments reveal our models adaptability to Singlish context,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MERaLiON/MERaLiON-3-10B-previewmodel· 322 dl· ♡ 1322 dl♡ 1
- 🤗MERaLiON/MERaLiON-2-10Bmodel· 711 dl· ♡ 11711 dl♡ 11
- 🤗MERaLiON/MERaLiON-2-3Bmodel· 2.6k dl· ♡ 52.6k dl♡ 5
- 🤗MERaLiON/MERaLiON-2-10B-ASRmodel· 1.4k dl· ♡ 101.4k dl♡ 10
- 🤗lewiswoncy/m_test_9model· 42 dl42 dl
- 🤗lewiswoncy/m_test_9_11model· 2 dl2 dl
- 🤗MERaLiON/MERaLiON-2-3B-MLXmodel· 8 dl8 dl
- 🤗MERaLiON/MERaLiON-2-10B-MLXmodel· 12 dl12 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Critical Thinking Development
MethodsSparse Evolutionary Training · Focus
