Evaluating Deep Neural Networks in Deployment (A Comparative and   Replicability Study)

Eduard Pinconschi; Divya Gopinath; Rui Abreu; Corina S. Pasareanu

arXiv:2407.08730·cs.NE·July 30, 2024·1 cites

Evaluating Deep Neural Networks in Deployment (A Comparative and Replicability Study)

Eduard Pinconschi, Divya Gopinath, Rui Abreu, Corina S. Pasareanu

PDF

Open Access 1 Repo

TL;DR

This paper compares recent methods for evaluating the reliability of deep neural networks in deployment, highlighting reproducibility issues and proposing a unified evaluation framework with common benchmarks and metrics.

Contribution

It provides a comprehensive comparison of existing evaluation approaches and introduces a standardized framework for assessing DNN reliability in safety-critical applications.

Findings

01

Difficulty in reproducing results across different approaches

02

Lack of standardized evaluation metrics

03

Need for unified evaluation frameworks

Abstract

As deep neural networks (DNNs) are increasingly used in safety-critical applications, there is a growing concern for their reliability. Even highly trained, high-performant networks are not 100% accurate. However, it is very difficult to predict their behavior during deployment without ground truth. In this paper, we provide a comparative and replicability study on recent approaches that have been proposed to evaluate the reliability of DNNs in deployment. We find that it is hard to run and reproduce the results for these approaches on their replication packages and even more difficult to run them on artifacts other than their own. Further, it is difficult to compare the effectiveness of the approaches, due to the lack of clearly defined evaluation metrics. Our results indicate that more effort is needed in our research community to obtain sound techniques for evaluating the reliability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trustdnn/issta2024
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Transformation in Industry