A Latent-Variable Model for Intrinsic Probing

Karolina Sta\'nczak; Lucas Torroba Hennigen; Adina Williams; Ryan Cotterell; Isabelle Augenstein

arXiv:2201.08214·cs.CL·August 8, 2025·1 cites

A Latent-Variable Model for Intrinsic Probing

Karolina Sta\'nczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a new latent-variable model for intrinsic probing of pre-trained language models, providing more accurate estimates of linguistic information encoding and revealing cross-lingual entanglement in morphosyntax.

Contribution

It proposes a novel latent-variable formulation for intrinsic probing with a tractable variational approximation, improving mutual information estimation over previous methods.

Findings

01

The model yields tighter mutual information estimates.

02

Pre-trained representations encode cross-lingually entangled morphosyntactic information.

03

The approach is versatile and improves analysis of linguistic attributes.

Abstract

The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic information. Indeed, it is natural to assume that these pre-trained representations do encode some level of linguistic knowledge as they have brought about large empirical improvements on a wide variety of NLP tasks, which suggests they are learning true linguistic generalization. In this work, we focus on intrinsic probing, an analysis technique where the goal is not only to identify whether a representation encodes a linguistic attribute but also to pinpoint where this attribute is encoded. We propose a novel latent-variable formulation for constructing intrinsic probes and derive a tractable variational approximation to the log-likelihood. Our results show that our model is versatile and yields tighter mutual information estimates than two intrinsic probes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

A Latent-Variable Model for Intrinsic Probing· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution