Operation is the hardest teacher: estimating DNN accuracy looking for   mispredictions

Antonio Guerriero; Roberto Pietrantuono; Stefano Russo

arXiv:2102.04287·cs.SE·March 27, 2024

Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions

Antonio Guerriero, Roberto Pietrantuono, Stefano Russo

PDF

1 Repo

TL;DR

This paper introduces DeepEST, a test selection method that actively seeks mispredictions in operational datasets to estimate DNN accuracy more effectively and identify potential bugs, outperforming existing techniques.

Contribution

DeepEST is a novel active sampling technique that prioritizes failing test cases to improve accuracy estimation and bug detection in DNN testing.

Findings

01

DeepEST detects 5 to 30 times more mispredictions than existing methods.

02

DeepEST achieves accuracy estimates comparable or better than current sampling techniques.

03

DeepEST requires smaller test suites to find significant mispredictions.

Abstract

Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dessertlab/DeepEST
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.