XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs

Linyang He; Ercong Nie; Sukru Samet Dindar; Arsalan Firoozi; Adrian; Florea; Van Nguyen; Corentin Puffay; Riki Shimizu; Haotian Ye; Jonathan; Brennan; Helmut Schmid; Hinrich Sch\"utze; Nima Mesgarani

arXiv:2502.19737·cs.CL·February 28, 2025

XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs

Linyang He, Ercong Nie, Sukru Samet Dindar, Arsalan Firoozi, Adrian, Florea, Van Nguyen, Corentin Puffay, Riki Shimizu, Haotian Ye, Jonathan, Brennan, Helmut Schmid, Hinrich Sch\"utze, Nima Mesgarani

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces XCOMPS, a multilingual benchmark dataset for evaluating large language models' understanding of concepts across 17 languages, revealing strengths and limitations in their multilingual and morphological reasoning abilities.

Contribution

The work presents a new multilingual conceptual minimal pair dataset and comprehensive evaluation methods for LLMs' conceptual understanding across diverse languages.

Findings

01

LLMs perform worse on low-resource languages.

02

Performance drops with subtle semantic differences.

03

Instruction tuning improves internal understanding but not explicit task accuracy.

Abstract

We introduce XCOMPS in this work, a multilingual conceptual minimal pair dataset covering 17 languages. Using this dataset, we evaluate LLMs' multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. By comparing base, instruction-tuned, and knowledge-distilled models, we find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) Instruction tuning improves performance in concept understanding but does not enhance internal competence; knowledge distillation can enhance internal competence in conceptual understanding for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

bbunzeck/babylm-german
dataset· 47 dl
47 dl

Videos

XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs· underline

Taxonomy

TopicsChild and Animal Learning Development · Topic Modeling · Natural Language Processing Techniques

MethodsKnowledge Distillation