On the data requirements of probing
Zining Zhu, Jixuan Wang, Bai Li, Frank Rudzicz

TL;DR
This paper introduces a quantitative method to estimate the optimal size of probing datasets for neural language models, enhancing reliability while managing data collection costs.
Contribution
It presents a novel statistical framework to determine the necessary data samples for effective probing configuration comparisons in neural NLP models.
Findings
The method accurately estimates required dataset sizes across case studies.
Proposed approach improves the reliability of probing experiments.
Framework aids systematic construction of probing datasets.
Abstract
As large and powerful neural language models are developed, researchers have been increasingly interested in developing diagnostic tools to probe them. There are many papers with conclusions of the form "observation X is found in model Y", using their own datasets with varying sizes. Larger probing datasets bring more reliability, but are also expensive to collect. There is yet to be a quantitative method for estimating reasonable probing dataset sizes. We tackle this omission in the context of comparing two probing configurations: after we have collected a small dataset from a pilot study, how many additional data samples are sufficient to distinguish two different configurations? We present a novel method to estimate the required number of data samples in such experiments and, across several case studies, we verify that our estimations have sufficient statistical power. Our framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning and Data Classification
