XL-WiC: A Multilingual Benchmark for Evaluating Semantic   Contextualization

Alessandro Raganato; Tommaso Pasini; Jose Camacho-Collados; Mohammad; Taher Pilehvar

arXiv:2010.06478·cs.CL·October 14, 2020

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad, Taher Pilehvar

PDF

1 Repo

TL;DR

XL-WiC introduces a comprehensive multilingual benchmark for evaluating semantic word sense disambiguation across 12 languages, enabling cross-lingual transfer and assessing multilingual models' understanding of word meanings.

Contribution

It presents the first large-scale multilingual WiC dataset, facilitating evaluation beyond English and enabling zero-shot cross-lingual transfer experiments.

Findings

01

Models trained on English data perform well on distant languages.

02

XL-WiC covers 12 diverse languages, expanding evaluation scope.

03

Multilingual models show competitive performance even without target language training data.

Abstract

The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques. However, most existing evaluation benchmarks for assessing this criterion are tied to sense inventories (usually WordNet), restricting their usage to a small subset of knowledge-based representation techniques. The Word-in-Context dataset (WiC) addresses the dependence on sense inventories by reformulating the standard disambiguation task as a binary classification problem; but, it is limited to the English language. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability, opening room for evaluation scenarios such as zero-shot cross-lingual transfer. We perform a series of experiments to determine the reliability of the datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pasinit/xlwic-runs
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.