TL;DR
This study evaluates how well static and contextualized multilingual models represent word meanings, especially homonymy and synonymy, revealing their strengths and limitations across four languages using a new dataset.
Contribution
Introduces a multilingual dataset for evaluating lexical-semantic relations and systematically assesses transformer-based models' ability to disambiguate word senses in context.
Findings
Transformers effectively disambiguate homonyms in context.
Models struggle to differentiate senses when sentences are similar.
Multilingual evaluation across four languages with new datasets.
Abstract
This paper presents a multilingual study of word meaning representations in context. We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations, such as homonymy and synonymy. To do so, we created a new multilingual dataset that allows us to perform a controlled evaluation of several factors such as the impact of the surrounding context or the overlap between words, conveying the same or different senses. A systematic assessment on four scenarios shows that the best monolingual models based on Transformers can adequately disambiguate homonyms in context. However, as they rely heavily on context, these models fail at representing words with different senses when occurring in similar sentences. Experiments are performed in Galician, Portuguese, English, and Spanish, and both the dataset (with more than 3,000 evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
