ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding
Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Yonas, Chanie, Bontu Fufa Balcha, Negasi Haile Abadi, Henok Biadglign Ademtew,, Mulubrhan Abebe Nerea, Debela Desalegn Yadeta, Derartu Dagne Geremew, Assefa, Atsbiha tesfau, Philipp Slusallek, Thamar Solorio

TL;DR
ProverbEval introduces a benchmark for evaluating low-resource language understanding in cultural contexts, highlighting factors like answer choice order and language that affect LLM performance variances.
Contribution
This work presents ProverbEval, a novel benchmark specifically designed for low-resource languages, emphasizing cultural aspects and analyzing factors influencing LLM evaluation outcomes.
Findings
Performance varies up to 50% based on answer choice order.
Native proverb descriptions improve task performance.
Monolingual evaluations outperform cross-lingual ones.
Abstract
With the rapid development of evaluation datasets to assess LLMs understanding across a wide range of subjects and domains, identifying a suitable language understanding benchmark has become increasingly challenging. In this work, we explore LLM evaluation challenges for low-resource language understanding and introduce \proverbeval, LLM evaluation benchmark for low-resource languages, focusing on low-resource language understanding in culture-specific scenarios. We benchmark various LLMs and explore factors that create variability in the benchmarking process. We observed performance variances of up to 50\%, depending on the order in which answer choices were presented in multiple-choice tasks. Native language proverb descriptions significantly improve tasks such as proverb generation, contributing to improved outcomes. Additionally, monolingual evaluations consistently outperformed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
MethodsSoftmax · Attention Is All You Need · Focus
