MERaLiON-AudioLLM: Bridging Audio and Language with Large Language   Models

Yingxu He; Zhuohan Liu; Shuo Sun; Bin Wang; Wenyu Zhang; Xunlong Zou,; Nancy F. Chen; Ai Ti Aw

arXiv:2412.09818·cs.CL·January 17, 2025

MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models

Yingxu He, Zhuohan Liu, Shuo Sun, Bin Wang, Wenyu Zhang, Xunlong Zou,, Nancy F. Chen, Ai Ti Aw

PDF

Open Access 9 Models

TL;DR

MERaLiON-AudioLLM is a pioneering multilingual speech-text model designed for Singapore's diverse linguistic landscape, improving speech recognition and understanding in complex, multicultural environments.

Contribution

It is the first speech-text model tailored for Singapore's multilingual context, integrating advanced speech and text processing for localized AI applications.

Findings

01

Enhanced speech recognition accuracy in multilingual settings

02

Improved task-specific understanding for regional dialects

03

Demonstrated effectiveness in complex, multicultural environments

Abstract

We introduce MERaLiON-AudioLLM (Multimodal Empathetic Reasoning and Learning in One Network), the first speech-text model tailored for Singapore's multilingual and multicultural landscape. Developed under the National Large Language Models Funding Initiative, Singapore, MERaLiON-AudioLLM integrates advanced speech and text processing to address the diverse linguistic nuances of local accents and dialects, enhancing accessibility and usability in complex, multilingual environments. Our results demonstrate improvements in both speech recognition and task-specific understanding, positioning MERaLiON-AudioLLM as a pioneering solution for region specific AI applications. We envision this release to set a precedent for future models designed to address localised linguistic and cultural contexts in a global framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeophysical Methods and Applications

MethodsSparse Evolutionary Training