TL;DR
This paper introduces DeepEST, a test selection method that actively seeks mispredictions in operational datasets to estimate DNN accuracy more effectively and identify potential bugs, outperforming existing techniques.
Contribution
DeepEST is a novel active sampling technique that prioritizes failing test cases to improve accuracy estimation and bug detection in DNN testing.
Findings
DeepEST detects 5 to 30 times more mispredictions than existing methods.
DeepEST achieves accuracy estimates comparable or better than current sampling techniques.
DeepEST requires smaller test suites to find significant mispredictions.
Abstract
Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
