Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks
Sinjini Banerjee, Reilly Cannon, Tim Marrinan, Tony Chiang, Anand D., Sarwate

TL;DR
This paper introduces a robust hypothesis-testing method to measure the variability in neural network models' outputs, revealing differences not captured by standard test accuracy.
Contribution
It proposes a new nonparametric measure of model similarity based on pre-threshold outputs, addressing variability overlooked by traditional accuracy metrics.
Findings
Models with similar accuracy can produce different functions.
The proposed measure detects differences in model outputs beyond accuracy.
Framework is adaptable to other model-derived quantities.
Abstract
Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case of classification is test accuracy. However, models with similar test accuracy may not be computing the same function. We propose a new measure of closeness between classification models based on the output of the network before thresholding. Our measure is based on a robust hypothesis-testing framework and can be adapted to other quantities derived from trained models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
