MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in   LLMs

Mengyuan Zhang; Ruihui Wang; Bo Xia; Yuan Sun; Xiaobing; Zhao

arXiv:2411.09492·cs.CL·November 15, 2024

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing, Zhao

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces MM-Eval, a comprehensive benchmark dataset designed to evaluate the language and cognitive abilities of large language models in Modern Mongolian, highlighting performance gaps and facilitating advancements in low-resource language NLP.

Contribution

The paper presents MM-Eval, a novel hierarchical benchmark dataset specifically for Modern Mongolian, enabling systematic evaluation of LLMs in low-resource language contexts.

Findings

01

Models perform better on syntactic than semantic tasks.

02

Knowledge transfer from high-resource to low-resource models is moderate.

03

MM-Eval provides extensive tasks for assessing NLP in Mongolian.

Abstract

Large language models (LLMs) excel in high-resource languages but face notable challenges in low-resource languages like Mongolian. This paper addresses these challenges by categorizing capabilities into language abilities (syntax and semantics) and cognitive abilities (knowledge and reasoning). To systematically evaluate these areas, we developed MM-Eval, a specialized dataset based on Modern Mongolian Language Textbook I and enriched with WebQSP and MGSM datasets. Preliminary experiments on models including Qwen2-7B-Instruct, GLM4-9b-chat, Llama3.1-8B-Instruct, GPT-4, and DeepseekV2.5 revealed that: 1) all models performed better on syntactic tasks than semantic tasks, highlighting a gap in deeper language understanding; and 2) knowledge tasks showed a moderate decline, suggesting that models can transfer general knowledge from high-resource to low-resource contexts. The release…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joenahm/mm-eval
noneOfficial

Models

🤗
munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian
model· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention