Evaluating natural language processing models with generalization   metrics that do not need access to any training or testing data

Yaoqing Yang; Ryan Theisen; Liam Hodgkinson; Joseph E. Gonzalez,; Kannan Ramchandran; Charles H. Martin; Michael W. Mahoney

arXiv:2202.02842·cs.CL·June 6, 2023

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez,, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

PDF

Open Access 1 Repo

TL;DR

This paper evaluates generalization metrics that do not require access to training or testing data for NLP models, demonstrating their effectiveness in selecting large pretrained Transformers and highlighting the usefulness of heavy-tail based metrics.

Contribution

It introduces and tests data-independent generalization metrics on large NLP models, extending prior work primarily focused on computer vision.

Findings

01

Heavy-tail based metrics show strong correlation with test error in NLP.

02

Metrics derived from power law spectral distributions outperform other metrics.

03

First application of data-independent metrics to large pretrained NLP Transformers.

Abstract

Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strongly with test performance. In this paper, we expand on prior analyses by examining generalization-metric-based model selection with the following objectives: (i) focusing on natural language processing (NLP) tasks, as prior work primarily concentrates on computer vision (CV) tasks; (ii) considering metrics that directly predict \emph{test error} instead of the \emph{generalization gap}; (iii) exploring metrics that do not need access to data to compute. From these objectives, we are able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nsfzyzz/generalization_metrics_for_nlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Machine Learning and Algorithms

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Dropout · Weight Decay · Softmax · Linear Warmup With Linear Decay