LLM Probe: Evaluating LLMs for Low-Resource Languages
Hailay Kidu Teklehaymanot, Gebrearegawi Gebremariam, Wolfgang Nejdl

TL;DR
This paper introduces LLM Probe, a systematic evaluation framework for assessing the linguistic capabilities of large language models in low-resource languages, using a new annotated benchmark dataset.
Contribution
The paper presents a novel lexicon-based assessment framework and a benchmark dataset for evaluating LLMs in low-resource, morphologically rich languages.
Findings
Sequence-to-sequence models outperform in morphosyntactic tasks and translation.
Causal models excel in lexical alignment but perform weaker in translation.
High inter-annotator agreement validates the reliability of the dataset.
Abstract
Despite rapid advances in large language models (LLMs), their linguistic abilities in low-resource and morphologically rich languages are still not well understood due to limited annotated resources and the absence of standardized evaluation frameworks. This paper presents LLM Probe, a lexicon-based assessment framework designed to systematically evaluate the linguistic skills of LLMs in low-resource language environments. The framework analyzes models across four areas of language understanding: lexical alignment, part-of-speech recognition, morphosyntactic probing, and translation accuracy. To illustrate the framework, we create a manually annotated benchmark dataset using a low-resource Semitic language as a case study. The dataset comprises bilingual lexicons with linguistic annotations, including part-of-speech tags, grammatical gender, and morphosyntactic features, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
