A Principled Evaluation Protocol for Comparative Investigation of the   Effectiveness of DNN Classification Models on Similar-but-non-identical   Datasets

Esla Timothy Anzaku; Haohan Wang; Arnout Van Messem; Wesley De Neve

arXiv:2209.01848·cs.LG·September 7, 2022·1 cites

A Principled Evaluation Protocol for Comparative Investigation of the Effectiveness of DNN Classification Models on Similar-but-non-identical Datasets

Esla Timothy Anzaku, Haohan Wang, Arnout Van Messem, Wesley De Neve

PDF

Open Access

TL;DR

This paper introduces a new evaluation protocol for DNN classification models that uses uncertainty-based data subset selection, revealing that models perform better on replication datasets than previously reported with traditional methods.

Contribution

The authors propose a principled evaluation protocol leveraging uncertainty to better assess DNN accuracy across datasets, improving upon conventional methods.

Findings

01

Models perform better on replication datasets than previously reported.

02

Traditional evaluation methods may overestimate accuracy degradation.

03

Uncertainty-based subset selection provides more realistic performance estimates.

Abstract

Deep Neural Network (DNN) models are increasingly evaluated using new replication test datasets, which have been carefully created to be similar to older and popular benchmark datasets. However, running counter to expectations, DNN classification models show significant, consistent, and largely unexplained degradation in accuracy on these replication test datasets. While the popular evaluation approach is to assess the accuracy of a model by making use of all the datapoints available in the respective test datasets, we argue that doing so hinders us from adequately capturing the behavior of DNN models and from having realistic expectations about their accuracy. Therefore, we propose a principled evaluation protocol that is suitable for performing comparative investigations of the accuracy of a DNN model on multiple test datasets, leveraging subsets of datapoints that can be selected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)

MethodsTest