XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
Linyang He, Ercong Nie, Sukru Samet Dindar, Arsalan Firoozi, Adrian, Florea, Van Nguyen, Corentin Puffay, Riki Shimizu, Haotian Ye, Jonathan, Brennan, Helmut Schmid, Hinrich Sch\"utze, Nima Mesgarani

TL;DR
This paper introduces XCOMPS, a multilingual benchmark dataset for evaluating large language models' understanding of concepts across 17 languages, revealing strengths and limitations in their multilingual and morphological reasoning abilities.
Contribution
The work presents a new multilingual conceptual minimal pair dataset and comprehensive evaluation methods for LLMs' conceptual understanding across diverse languages.
Findings
LLMs perform worse on low-resource languages.
Performance drops with subtle semantic differences.
Instruction tuning improves internal understanding but not explicit task accuracy.
Abstract
We introduce XCOMPS in this work, a multilingual conceptual minimal pair dataset covering 17 languages. Using this dataset, we evaluate LLMs' multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. By comparing base, instruction-tuned, and knowledge-distilled models, we find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) Instruction tuning improves performance in concept understanding but does not enhance internal competence; knowledge distillation can enhance internal competence in conceptual understanding for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsChild and Animal Learning Development · Topic Modeling · Natural Language Processing Techniques
MethodsKnowledge Distillation
