Deriving Contextualised Semantic Features from BERT (and Other   Transformer Model) Embeddings

Jacob Turton; David Vinson; Robert Elliott Smith

arXiv:2012.15353·cs.CL·January 1, 2021

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Jacob Turton, David Vinson, Robert Elliott Smith

PDF

TL;DR

This paper introduces a method to derive contextualized semantic features from BERT embeddings, enhancing interpretability of word meanings in context and revealing how semantic features are represented across BERT's layers.

Contribution

It demonstrates that Binder semantic features can be extracted from BERT embeddings, enabling contextualized semantic analysis and improving interpretability of transformer-based models.

Findings

01

Binder features can be derived from BERT embeddings.

02

Semantic features vary across BERT layers.

03

Contextualized semantic features improve interpretability.

Abstract

Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information about words in context. However, as single entities, these embeddings are difficult to interpret and the models used to create them have been described as opaque. Binder and colleagues proposed an intuitive embedding space where each dimension is based on one of 65 core semantic features. Unfortunately, the space only exists for a small dataset of 535 words, limiting its uses. Previous work (Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020) has shown that Binder features can be derived from static embeddings and successfully extrapolated to a large new vocabulary. Taking the next step, this paper demonstrates that Binder features can be derived from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Dropout · Softmax · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · Attention Is All You Need · Layer Normalization · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia?