Measuring Hong Kong Massive Multi-Task Language Understanding

Chuxue Cao; Zhenghao Zhu; Junqi Zhu; Guoying Lu; Siyu Peng; Juntao; Dai; Weijie Shi; Sirui Han; Yike Guo

arXiv:2505.02177·cs.CL·May 6, 2025

Measuring Hong Kong Massive Multi-Task Language Understanding

Chuxue Cao, Zhenghao Zhu, Junqi Zhu, Guoying Lu, Siyu Peng, Juntao, Dai, Weijie Shi, Sirui Han, Yike Guo

PDF

Open Access 1 Datasets

TL;DR

This paper introduces HKMMLU, a comprehensive benchmark for evaluating multilingual and socio-cultural understanding of LLMs in Hong Kong's unique linguistic context, revealing significant performance gaps and influencing future model development.

Contribution

The paper presents HKMMLU, a novel multi-task benchmark tailored for Hong Kong's linguistic and cultural landscape, including 66 subjects and translation tasks, to evaluate LLMs' capabilities.

Findings

01

DeepSeek-V3 achieves up to 75% accuracy, still below MMLU and CMMLU.

02

Model performance is affected by question language, size, and prompting strategies.

03

Significant performance gap indicates need for improved Hong Kong-specific language models.

Abstract

Multilingual understanding is crucial for the cross-cultural applicability of Large Language Models (LLMs). However, evaluation benchmarks designed for Hong Kong's unique linguistic landscape, which combines Traditional Chinese script with Cantonese as the spoken form and its cultural context, remain underdeveloped. To address this gap, we introduce HKMMLU, a multi-task language understanding benchmark that evaluates Hong Kong's linguistic competence and socio-cultural knowledge. The HKMMLU includes 26,698 multi-choice questions across 66 subjects, organized into four categories: Science, Technology, Engineering, and Mathematics (STEM), Social Sciences, Humanities, and Other. To evaluate the multilingual understanding ability of LLMs, 90,550 Mandarin-Cantonese translation tasks were additionally included. We conduct comprehensive experiments on GPT-4o, Claude 3.7 Sonnet, and 18…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

chuxuecao/HKMMLU
dataset· 607 dl
607 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling