Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers
Soniya Vijayakumar, Josef van Genabith, Simon Ostermann

TL;DR
This paper investigates how different sub-layers of a pre-trained language model encode contextual information for polysemous words, revealing that high contextualization in top layers is context-dependent and not uniform.
Contribution
It introduces a detailed probing methodology to localize contextualization strength across PLM sub-layers, focusing on polysemous words and varying context lengths.
Findings
High contextualization in top sub-layers for specific word positions
Context length and richness influence the degree of contextualization
Contextualization does not systematically generalize across positions
Abstract
In the era of high performing Large Language Models, researchers have widely acknowledged that contextual word representations are one of the key drivers in achieving top performances in downstream tasks. In this work, we investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM) by empirical experiments using linear probes. Unlike previous work, we are particularly interested in identifying the strength of contextualization across PLM sub-layer representations (i.e. Self-Attention, Feed-Forward Activation and Output sub-layers). To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs, and compare how these representations change through the forward pass of the PLM network. Second, by probing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Dropout · Attention Dropout · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay
