TL;DR
This paper introduces Bird's Eye, an information-theoretic probe for detecting linguistic graph structures in language models, and demonstrates its effectiveness in analyzing BERT's encoding of syntactic and semantic information.
Contribution
The paper presents a novel, simple mutual information-based probing method and a localized analysis approach for understanding linguistic graph encoding in contextualized representations.
Findings
BERT encodes both syntactic and semantic information.
Syntactic information is encoded to a greater extent.
The proposed probes effectively analyze linguistic structures.
Abstract
NLP has a rich history of representing our prior understanding of language in the form of graphs. Recent work on analyzing contextualized text representations has focused on hand-designed probe models to understand how and to what extent do these representations encode a particular linguistic phenomenon. However, due to the inter-dependence of various phenomena and randomness of training probe models, detecting how these representations encode the rich information in these linguistic graphs remains a challenging problem. In this paper, we propose a new information-theoretic probe, Bird's Eye, which is a fairly simple probe method for detecting if and how these representations encode the information in these linguistic graphs. Instead of using classifier performance, our probe takes an information-theoretic view of probing and estimates the mutual information between the linguistic graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Softmax · WordPiece · Dense Connections · Adam · Layer Normalization
