Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?
Jakub Ho\'sci{\l}owicz, Marcin Sowa\'nski, Piotr Czubowski, Artur, Janicki

TL;DR
This paper investigates the use of probing methods to understand how fine-tuning and knowledge distillation affect BERT-based NLU models, revealing limitations in current probing techniques for practical insights.
Contribution
The study critically evaluates the effectiveness of probing methods in analyzing BERT fine-tuning and distillation, highlighting the need for decodability metrics.
Findings
Probing methods in their current form are not well suited for practical analysis.
Structural, Edge, and Conditional probes do not account for decoding ease.
Quantification of information decodability is essential for practical applications.
Abstract
In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Adam · Layer Normalization · Label Smoothing · Weight Decay · Multi-Head Attention · Dense Connections
