MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing, Zhao

TL;DR
This paper introduces MM-Eval, a comprehensive benchmark dataset designed to evaluate the language and cognitive abilities of large language models in Modern Mongolian, highlighting performance gaps and facilitating advancements in low-resource language NLP.
Contribution
The paper presents MM-Eval, a novel hierarchical benchmark dataset specifically for Modern Mongolian, enabling systematic evaluation of LLMs in low-resource language contexts.
Findings
Models perform better on syntactic than semantic tasks.
Knowledge transfer from high-resource to low-resource models is moderate.
MM-Eval provides extensive tasks for assessing NLP in Mongolian.
Abstract
Large language models (LLMs) excel in high-resource languages but face notable challenges in low-resource languages like Mongolian. This paper addresses these challenges by categorizing capabilities into language abilities (syntax and semantics) and cognitive abilities (knowledge and reasoning). To systematically evaluate these areas, we developed MM-Eval, a specialized dataset based on Modern Mongolian Language Textbook I and enriched with WebQSP and MGSM datasets. Preliminary experiments on models including Qwen2-7B-Instruct, GLM4-9b-chat, Llama3.1-8B-Instruct, GPT-4, and DeepseekV2.5 revealed that: 1) all models performed better on syntactic tasks than semantic tasks, highlighting a gap in deeper language understanding; and 2) knowledge tasks showed a moderate decline, suggesting that models can transfer general knowledge from high-resource to low-resource contexts. The release…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention
