SEA-HELM: Southeast Asian Holistic Evaluation of Language Models
Yosephine Susanto, Adithya Venkatadri Hulagadri, Jann Railey Montalan, Jian Gang Ngui, Xian Bin Yong, Weiqi Leong, Hamsawardhini Rengarajan, Peerat Limkonchotiwat, Yifan Mai, William Chandra Tjhi

TL;DR
SEA-HELM is a comprehensive evaluation suite designed to assess large language models' multilingual and multicultural capabilities specifically for Southeast Asian languages, addressing a gap in culturally representative benchmarks.
Contribution
It introduces SEA-HELM, the first holistic evaluation framework for SEA languages, covering linguistic, cultural, safety, and model-specific aspects, with an accessible leaderboard.
Findings
Supports Filipino, Indonesian, Tamil, Thai, Vietnamese
Provides systematic multilingual and multicultural performance insights
Open-source evaluation code available
Abstract
With the rapid emergence of novel capabilities in Large Language Models (LLMs), the need for rigorous multilingual and multicultural benchmarks that are integrated has become more pronounced. Though existing LLM benchmarks are capable of evaluating specific capabilities of LLMs in English as well as in various mid- to low-resource languages, including those in the Southeast Asian (SEA) region, a comprehensive and culturally representative evaluation suite for the SEA languages has not been developed thus far. Here, we present SEA-HELM, a holistic linguistic and cultural LLM evaluation suite that emphasises SEA languages, comprising five core pillars: (1) NLP Classics, (2) LLM-specifics, (3) SEA Linguistics, (4) SEA Culture, (5) Safety. SEA-HELM currently supports Filipino, Indonesian, Tamil, Thai, and Vietnamese. We also introduce the SEA-HELM leaderboard, which allows users to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Sahabat-AI/Llama-Sahabat-AI-v2-70B-ITmodel· 110 dl· ♡ 13110 dl♡ 13
- 🤗aisingapore/Gemma-SEA-LION-v3-9B-ITmodel· 1.1k dl· ♡ 141.1k dl♡ 14
- 🤗aisingapore/Gemma-SEA-LION-v3-9Bmodel· 43 dl· ♡ 743 dl♡ 7
- 🤗aisingapore/Llama-SEA-LION-v3-8Bmodel· 50 dl· ♡ 250 dl♡ 2
- 🤗aisingapore/Llama-SEA-LION-v3-8B-ITmodel· 2.1k dl· ♡ 82.1k dl♡ 8
- 🤗aisingapore/Llama-SEA-LION-v3-70Bmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗aisingapore/Llama-SEA-LION-v3-70B-ITmodel· 827 dl· ♡ 4827 dl♡ 4
- 🤗aisingapore/Llama-SEA-LION-v3.5-8B-Rmodel· 1.3k dl· ♡ 111.3k dl♡ 11
- 🤗aisingapore/Llama-SEA-LION-v3.5-70B-Rmodel· 21 dl· ♡ 221 dl♡ 2
- 🤗GoToCompany/Llama-Sahabat-AI-v2-70B-ITmodel· 5 dl· ♡ 85 dl♡ 8
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Natural Language Processing Techniques
