Multilingual Definition Modeling
Edison Marrese-Taylor, Erica K. Shimomoto, Alfredo Solano, Enrique Reid

TL;DR
This study evaluates multilingual language models' ability to generate dictionary definitions across four languages, revealing their strengths, limitations, and potential as a resource-efficient alternative to traditional benchmarks.
Contribution
It is the first comprehensive multilingual definition modeling study, assessing pre-trained models and LLMs, and proposing a new evaluation approach correlating with existing benchmarks.
Findings
Multilingual models perform comparably to English in definition modeling.
LLMs show strong zero-shot and few-shot capabilities but have notable shortcomings.
Performance correlates with existing multilingual benchmark scores.
Abstract
In this paper, we propose the first multilingual study on definition modeling. We use monolingual dictionary data for four new languages (Spanish, French, Portuguese, and German) and perform an in-depth empirical study to test the performance of pre-trained multilingual language models on definition modeling of monosemic words when finetuned on this data. Furthermore, we use a zero-shot approach to test the multilingual capabilities of two popular chat-based Large Language Models (LLMs) in the task. Results show that multilingual language models can perform on-pair with English but cannot leverage potential cross-lingual synergies, with LLMs generally offering better performance overall. A comprehensive human evaluation of the LLM-generated definition highlights the zero and few-shot capabilities of these models in this new task, also showing their shortcomings. Finally, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
