SemBench: A Universal Semantic Framework for LLM Evaluation
Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau

TL;DR
SemBench is a scalable, language-independent framework that automatically generates synthetic benchmarks to evaluate the semantic understanding of large language models across multiple languages.
Contribution
Introduces SemBench, a novel, resource-efficient method for assessing LLM semantic competence using dictionary definitions and sentence encoders, enabling cross-lingual evaluation.
Findings
SemBench rankings correlate well with WiC datasets.
Few examples suffice for stable model ranking.
Framework is scalable and language-independent.
Abstract
Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of these models remains a persistent challenge. Traditional benchmarks such as Word-in-Context (WiC) effectively probe this capability, but their creation is resource-intensive and often limited to high-resource languages. In this paper, we introduce SemBench, a framework for automatically generating synthetic benchmarks that assess the semantic competence of LLMs using only dictionary sense definitions and a sentence encoder. This approach eliminates the need for curated example sentences, making it both scalable and language-independent. We evaluate SemBench in three languages (English, Spanish, and Basque) spanning different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
