SemBench: A Universal Semantic Framework for LLM Evaluation

Mikel Zubillaga; Naiara Perez; Oscar Sainz; German Rigau

arXiv:2603.11687·cs.CL·March 27, 2026

SemBench: A Universal Semantic Framework for LLM Evaluation

Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau

PDF

Open Access

TL;DR

SemBench is a scalable, language-independent framework that automatically generates synthetic benchmarks to evaluate the semantic understanding of large language models across multiple languages.

Contribution

Introduces SemBench, a novel, resource-efficient method for assessing LLM semantic competence using dictionary definitions and sentence encoders, enabling cross-lingual evaluation.

Findings

01

SemBench rankings correlate well with WiC datasets.

02

Few examples suffice for stable model ranking.

03

Framework is scalable and language-independent.

Abstract

Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of these models remains a persistent challenge. Traditional benchmarks such as Word-in-Context (WiC) effectively probe this capability, but their creation is resource-intensive and often limited to high-resource languages. In this paper, we introduce SemBench, a framework for automatically generating synthetic benchmarks that assess the semantic competence of LLMs using only dictionary sense definitions and a sentence encoder. This approach eliminates the need for curated example sentences, making it both scalable and language-independent. We evaluate SemBench in three languages (English, Spanish, and Basque) spanning different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods