TL;DR
This paper investigates how the context length of input spans influences the interpretation of BERT's processing patterns, revealing that unaccounted variations can lead to contradictory conclusions about what the model prioritizes.
Contribution
It highlights the mediating role of context length in neural model probing and provides best practices to ensure more reliable localization of processing in BERT.
Findings
Context length significantly affects probing results.
Manipulating context length distribution yields 196 different task rankings.
Best practices are proposed for future probing studies.
Abstract
Probing neural models for the ability to perform downstream tasks using their activation patterns is often used to localize what parts of the network specialize in performing what tasks. However, little work addressed potential mediating factors in such comparisons. As a test-case mediating factor, we consider the prediction's context length, namely the length of the span whose processing is minimally required to perform the prediction. We show that not controlling for context length may lead to contradictory conclusions as to the localization patterns of the network, depending on the distribution of the probing dataset. Indeed, when probing BERT with seven tasks, we find that it is possible to get 196 different rankings between them when manipulating the distribution of context lengths in the probing dataset. We conclude by presenting best practices for conducting such comparisons in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Linear Warmup With Linear Decay · Weight Decay · Adam · WordPiece · Dropout
