To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models
Bastien Li\'etard, Pascal Denis, Mikaela Keller

TL;DR
This paper introduces Concept Induction, an unsupervised method to cluster words into shared concepts using a bi-level approach, improving understanding of lexical semantics and enhancing Word-in-Context task performance.
Contribution
It proposes a novel bi-level approach to unsupervised Concept Induction that combines local and global views, generalizing Word Sense Induction and producing effective concept embeddings.
Findings
Achieved BCubed F1 above 0.60 on SemCor data
Local and global levels mutually improve concept and sense induction
Concept embeddings perform competitively on Word-in-Context task
Abstract
Polysemy and synonymy are two crucial interrelated facets of lexical ambiguity. While both phenomena are widely documented in lexical resources and have been studied extensively in NLP, leading to dedicated systems, they are often being considered independently in practical problems. While many tasks dealing with polysemy (e.g. Word Sense Disambiguation or Induction) highlight the role of word's senses, the study of synonymy is rooted in the study of concepts, i.e. meanings shared across the lexicon. In this paper, we introduce Concept Induction, the unsupervised task of learning a soft clustering among words that defines a set of concepts directly from data. This task generalizes Word Sense Induction. We propose a bi-level approach to Concept Induction that leverages both a local lemma-centric view and a global cross-lexicon view to induce concepts. We evaluate the obtained clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training
