Breaking Down Word Semantics from Pre-trained Language Models through Layer-wise Dimension Selection
Nayoung Choi

TL;DR
This paper proposes a method to interpret word semantics in BERT by layer-wise dimension selection, improving semantic understanding without retraining the model.
Contribution
It introduces a layer-wise binary masking technique to disentangle semantic sense from BERT embeddings, enhancing interpretability and semantic similarity tasks.
Findings
Layer-wise information is effective for semantic interpretation.
Disentangling semantic sense improves classification performance.
Method does not require updating pre-trained parameters.
Abstract
Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT show that leveraging layer-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Weight Decay · Softmax · Linear Warmup With Linear Decay · WordPiece · Attention Dropout
