Predicting Fine-Tuning Performance with Probing
Zining Zhu, Soroosh Shahtalebi, Frank Rudzicz

TL;DR
This paper demonstrates that probing tests on large NLP models can effectively predict their fine-tuning performance, offering a lightweight diagnostic tool for model development.
Contribution
It introduces a method to predict fine-tuning success using only three probing tests, reducing prediction errors significantly compared to baselines.
Findings
Probing accuracies can predict fine-tuning performance with 40-80% smaller errors.
Using three probing tests suffices to estimate fine-tuning outcomes.
Probing can serve as a lightweight proxy signal in NLP model development.
Abstract
Large NLP models have recently shown impressive performance in language understanding tasks, typically evaluated by their fine-tuned performance. Alternatively, probing has received increasing attention as being a lightweight method for interpreting the intrinsic mechanisms of large NLP models. In probing, post-hoc classifiers are trained on "out-of-domain" datasets that diagnose specific abilities. While probing the language models has led to insightful findings, they appear disjointed from the development of models. This paper explores the utility of probing deep NLP models to extract a proxy signal widely used in model development -- the fine-tuning performance. We find that it is possible to use the accuracies of only three probing tests to predict the fine-tuning performance with errors - smaller than baselines. We further discuss possible avenues where probing can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
