Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao, Zhenghao Zhu, Junqi Zhu, Guoying Lu, Siyu Peng, Juntao, Dai, Weijie Shi, Sirui Han, Yike Guo

TL;DR
This paper introduces HKMMLU, a comprehensive benchmark for evaluating multilingual and socio-cultural understanding of LLMs in Hong Kong's unique linguistic context, revealing significant performance gaps and influencing future model development.
Contribution
The paper presents HKMMLU, a novel multi-task benchmark tailored for Hong Kong's linguistic and cultural landscape, including 66 subjects and translation tasks, to evaluate LLMs' capabilities.
Findings
DeepSeek-V3 achieves up to 75% accuracy, still below MMLU and CMMLU.
Model performance is affected by question language, size, and prompting strategies.
Significant performance gap indicates need for improved Hong Kong-specific language models.
Abstract
Multilingual understanding is crucial for the cross-cultural applicability of Large Language Models (LLMs). However, evaluation benchmarks designed for Hong Kong's unique linguistic landscape, which combines Traditional Chinese script with Cantonese as the spoken form and its cultural context, remain underdeveloped. To address this gap, we introduce HKMMLU, a multi-task language understanding benchmark that evaluates Hong Kong's linguistic competence and socio-cultural knowledge. The HKMMLU includes 26,698 multi-choice questions across 66 subjects, organized into four categories: Science, Technology, Engineering, and Mathematics (STEM), Social Sciences, Humanities, and Other. To evaluate the multilingual understanding ability of LLMs, 90,550 Mandarin-Cantonese translation tasks were additionally included. We conduct comprehensive experiments on GPT-4o, Claude 3.7 Sonnet, and 18…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
