Breaking Down Word Semantics from Pre-trained Language Models through   Layer-wise Dimension Selection

Nayoung Choi

arXiv:2310.05115·cs.CL·October 10, 2023

Breaking Down Word Semantics from Pre-trained Language Models through Layer-wise Dimension Selection

Nayoung Choi

PDF

Open Access

TL;DR

This paper proposes a method to interpret word semantics in BERT by layer-wise dimension selection, improving semantic understanding without retraining the model.

Contribution

It introduces a layer-wise binary masking technique to disentangle semantic sense from BERT embeddings, enhancing interpretability and semantic similarity tasks.

Findings

01

Layer-wise information is effective for semantic interpretation.

02

Disentangling semantic sense improves classification performance.

03

Method does not require updating pre-trained parameters.

Abstract

Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT $_{base}$ show that leveraging layer-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Weight Decay · Softmax · Linear Warmup With Linear Decay · WordPiece · Attention Dropout