Topics in Contextualised Attention Embeddings

Mozhgan Talebpour; Alba Garcia Seco de Herrera; Shoaib Jameel

arXiv:2301.04339·cs.CL·January 12, 2023·1 cites

Topics in Contextualised Attention Embeddings

Mozhgan Talebpour, Alba Garcia Seco de Herrera, Shoaib Jameel

PDF

Open Access

TL;DR

This paper investigates how contextualised word embeddings from models like BERT implicitly form topical clusters, revealing the role of the attention mechanism in this process through probing experiments.

Contribution

It demonstrates that the attention framework in pre-trained language models is crucial for forming word topic clusters without explicit topic modeling.

Findings

01

Attention mechanisms are key to topical clustering in embeddings.

02

Clustering on contextual representations emulates latent topic structures.

03

Probing experiments reveal the implicit formation of word topics.

Abstract

Contextualised word vectors obtained via pre-trained language models encode a variety of knowledge that has already been exploited in applications. Complementary to these language models are probabilistic topic models that learn thematic patterns from the text. Recent work has demonstrated that conducting clustering on the word-level contextual representations from a language model emulates word clusters that are discovered in latent topics of words from Latent Dirichlet Allocation. The important question is how such topical word clusters are automatically formed, through clustering, in the language model when it has not been explicitly designed to model latent topics. To address this question, we design different probe experiments. Using BERT and DistilBERT, we find that the attention framework plays a key role in modelling such word topic clusters. We strongly believe that our work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Residual Connection · Dense Connections · Layer Normalization · WordPiece · Attention Dropout · Weight Decay · Linear Layer