Open Conversational LLMs do not know most Spanish words
Javier Conde, Miguel Gonz\'alez, Nina Melero, Raquel Ferrando, Gonzalo, Mart\'inez, Elena Merino-G\'omez, Jos\'e Alberto Hern\'andez, Pedro, Reviriego

TL;DR
This paper evaluates open-source conversational LLMs' knowledge of Spanish words, revealing significant gaps in their understanding and usage, which highlights the need for more linguistically fair models across languages.
Contribution
It introduces a novel evaluation of open-source chat LLMs' Spanish vocabulary knowledge, exposing language disparities and emphasizing the importance of linguistic fairness.
Findings
Many Spanish words are assigned incorrect meanings by LLMs.
Most LLMs struggle to use Spanish words correctly in context.
Spanish is underrepresented in open-source LLM capabilities.
Abstract
The growing interest in Large Language Models (LLMs) and in particular in conversational models with which users can interact has led to the development of a large number of open-source chat LLMs. These models are evaluated on a wide range of benchmarks to assess their capabilities in answering questions or solving problems on almost any possible topic or to test their ability to reason or interpret texts. Instead, the evaluation of the knowledge that these models have of the languages has received much less attention. For example, the words that they can recognize and use in different languages. In this paper, we evaluate the knowledge that open-source chat LLMs have of Spanish words by testing a sample of words in a reference dictionary. The results show that open-source chat LLMs produce incorrect meanings for an important fraction of the words and are not able to use most of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Spanish Linguistics and Language Studies
