Multilingual Substitution-based Word Sense Induction

Denis Kokosinskii; Nikolay Arefyev

arXiv:2405.11086·cs.CL·May 21, 2024

Multilingual Substitution-based Word Sense Induction

Denis Kokosinskii, Nikolay Arefyev

PDF

Open Access

TL;DR

This paper introduces multilingual substitution-based Word Sense Induction methods that work across 100 languages with minimal adaptation, matching English performance and aiding lower-resource languages.

Contribution

The paper presents a novel multilingual WSI approach that requires little to no language-specific tuning, expanding applicability to many languages.

Findings

01

Methods perform on par with monolingual approaches on English datasets.

02

Approach is effective for low-resource languages lacking lexical resources.

03

Supports 100 languages with minimal adaptation.

Abstract

Word Sense Induction (WSI) is the task of discovering senses of an ambiguous word by grouping usages of this word into clusters corresponding to these senses. Many approaches were proposed to solve WSI in English and a few other languages, but these approaches are not easily adaptable to new languages. We present multilingual substitution-based WSI methods that support any of 100 languages covered by the underlying multilingual language model with minimal to no adaptation required. Despite the multilingual capabilities, our methods perform on par with the existing monolingual approaches on popular English WSI datasets. At the same time, they will be most useful for lower-resourced languages which miss lexical resources available for English, thus, have higher demand for unsupervised methods like WSI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques