TL;DR
This paper introduces a Bayesian mutual information framework for information-theoretic probing, addressing limitations of traditional mutual information in finite data scenarios, and providing more intuitive insights into information encoding in representations.
Contribution
It proposes a novel Bayesian mutual information measure that better captures information in finite data settings and applies it to probing to assess extractability of information.
Findings
Bayesian MI allows data to add, process to help, and information to hurt.
The framework provides more intuitive insights into representation encoding.
Application to probing shows how background knowledge affects information extraction.
Abstract
Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective. They argue that probing should be seen as approximating a mutual information. This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences. The mutual information, however, assumes the true probability distribution of a pair of random variables is known, leading to unintuitive results in settings where it is not. This paper proposes a new framework to measure what we term Bayesian mutual information, which analyses information from the perspective of Bayesian agents -- allowing for more intuitive findings in scenarios with finite data. For instance, under Bayesian MI we have that data can add information, processing can help, and information can hurt, which makes it more intuitive for machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
