Multi-Sense Language Modelling

Andrea Lekkas; Peter Schneider-Kamp; Isabelle Augenstein

arXiv:2012.05776·cs.CL·June 2, 2022

Multi-Sense Language Modelling

Andrea Lekkas, Peter Schneider-Kamp, Isabelle Augenstein

PDF

TL;DR

This paper introduces a multi-sense language model that predicts both the next word and its sense in context, aiming to improve language understanding and linking with knowledge bases.

Contribution

It proposes a structured prediction framework with a Graph Attention Network for sense prediction, addressing the challenge of modeling polysemy explicitly.

Findings

01

Multi-sense modeling is highly challenging.

02

Standard architectures are insufficient for sense prediction.

03

Future datasets are needed for progress.

Abstract

The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. To aid sense prediction, we utilise a Graph Attention Network, which encodes definitions and example…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.