Probing Context Localization of Polysemous Words in Pre-trained Language   Model Sub-Layers

Soniya Vijayakumar; Josef van Genabith; Simon Ostermann

arXiv:2409.14097·cs.CL·September 24, 2024

Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers

Soniya Vijayakumar, Josef van Genabith, Simon Ostermann

PDF

Open Access

TL;DR

This paper investigates how different sub-layers of a pre-trained language model encode contextual information for polysemous words, revealing that high contextualization in top layers is context-dependent and not uniform.

Contribution

It introduces a detailed probing methodology to localize contextualization strength across PLM sub-layers, focusing on polysemous words and varying context lengths.

Findings

01

High contextualization in top sub-layers for specific word positions

02

Context length and richness influence the degree of contextualization

03

Contextualization does not systematically generalize across positions

Abstract

In the era of high performing Large Language Models, researchers have widely acknowledged that contextual word representations are one of the key drivers in achieving top performances in downstream tasks. In this work, we investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM) by empirical experiments using linear probes. Unlike previous work, we are particularly interested in identifying the strength of contextualization across PLM sub-layer representations (i.e. Self-Attention, Feed-Forward Activation and Output sub-layers). To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs, and compare how these representations change through the forward pass of the PLM network. Second, by probing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Dropout · Attention Dropout · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay