Expected Validation Performance and Estimation of a Random Variable's   Maximum

Jesse Dodge; Suchin Gururangan; Dallas Card; Roy Schwartz; Noah A.; Smith

arXiv:2110.00613·cs.CL·October 5, 2021

Expected Validation Performance and Estimation of a Random Variable's Maximum

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A., Smith

PDF

Open Access

TL;DR

This paper evaluates three statistical estimators for expected validation performance in NLP, analyzing their bias, variance, and MSE to improve model comparison accuracy and reproducibility.

Contribution

It provides a comprehensive analysis of bias, variance, and MSE of estimators for expected validation performance, highlighting the bias-variance tradeoff and its impact on model comparison.

Findings

01

Unbiased estimator has highest variance.

02

Smallest variance estimator has largest bias.

03

Estimator with smallest MSE balances bias and variance.

Abstract

Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments). Where previous work analyzing such estimators focused on the bias, we also examine the variance and mean squared error (MSE). In both synthetic and realistic scenarios, we evaluate three estimators and find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias; the estimator with the smallest MSE strikes a balance between bias and variance, displaying a classic bias-variance tradeoff. We use expected validation performance to compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Machine Learning and Data Classification