MaLLaM -- Malaysia Large Language Model

Husein Zolkepli; Aisyah Razak; Kamarul Adha; Ariff Nazhan

arXiv:2401.14680·cs.CL·January 30, 2024·1 cites

MaLLaM -- Malaysia Large Language Model

Husein Zolkepli, Aisyah Razak, Kamarul Adha, Ariff Nazhan

PDF

Open Access

TL;DR

MaLLaM is a set of large language models trained from scratch on Malaysian data, demonstrating strong performance in understanding and generating Malay language tasks, and contributing to localized NLP advancements.

Contribution

This work introduces MaLLaM, the first large-scale Malay language models trained from scratch with up to 5 billion parameters, tailored for Malaysian language understanding and generation.

Findings

01

MaLLaM models perform competitively against ChatGPT3.5 and Mistral.

02

Instruction-tuned MaLLaM models show notable proficiency in language tasks.

03

Models effectively capture Malaysian linguistic nuances.

Abstract

Addressing the gap in Large Language Model pretrained from scratch with Malaysian context, We trained models with 1.1 billion, 3 billion, and 5 billion parameters on a substantial 349GB dataset, equivalent to 90 billion tokens based on our pretrained Byte Pair Encoding (BPE) tokenizer for a single epoch. MaLLaM contributes to enhanced natural language understanding and generation tasks in the Malay language. Although trained on a smaller dataset of 90 billion tokens, our instruction-tuned MaLLaM models perform competitively. When compared to ChatGPT3.5 and Malaysian Mistral, MaLLaM's instruction-tuned models demonstrate notable proficiency, underscoring the effectiveness of our approach in capturing and understanding the nuances of the Malaysian language. MaLLaM models mark a significant contribution to the field, providing comprehensive language representations grounded in Malaysian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling