Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez,, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

TL;DR
This paper evaluates generalization metrics that do not require access to training or testing data for NLP models, demonstrating their effectiveness in selecting large pretrained Transformers and highlighting the usefulness of heavy-tail based metrics.
Contribution
It introduces and tests data-independent generalization metrics on large NLP models, extending prior work primarily focused on computer vision.
Findings
Heavy-tail based metrics show strong correlation with test error in NLP.
Metrics derived from power law spectral distributions outperform other metrics.
First application of data-independent metrics to large pretrained NLP Transformers.
Abstract
Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strongly with test performance. In this paper, we expand on prior analyses by examining generalization-metric-based model selection with the following objectives: (i) focusing on natural language processing (NLP) tasks, as prior work primarily concentrates on computer vision (CV) tasks; (ii) considering metrics that directly predict \emph{test error} instead of the \emph{generalization gap}; (iii) exploring metrics that do not need access to data to compute. From these objectives, we are able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Machine Learning and Algorithms
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Dropout · Weight Decay · Softmax · Linear Warmup With Linear Decay
