When and Why Test Generators for Deep Learning Produce Invalid Inputs:   an Empirical Study

Vincenzo Riccio; Paolo Tonella

arXiv:2212.11368·cs.SE·December 23, 2022

When and Why Test Generators for Deep Learning Produce Invalid Inputs: an Empirical Study

Vincenzo Riccio, Paolo Tonella

PDF

Open Access 1 Repo

TL;DR

This empirical study evaluates the validity of inputs generated by various test input generators for deep learning, revealing that most are valid but may not always preserve the correct labels, with automated validators aligning well with human judgment.

Contribution

The paper provides a comprehensive empirical analysis of input validity in deep learning test generators, comparing automated and human validation across multiple datasets and generators.

Findings

01

84% of generated inputs are valid according to automated validators

02

Automated validators agree with humans 78% of the time

03

Generated inputs often do not preserve the expected labels

Abstract

Testing Deep Learning (DL) based systems inherently requires large and representative test sets to evaluate whether DL systems generalise beyond their training datasets. Diverse Test Input Generators (TIGs) have been proposed to produce artificial inputs that expose issues of the DL systems by triggering misbehaviours. Unfortunately, such generated inputs may be invalid, i.e., not recognisable as part of the input domain, thus providing an unreliable quality assessment. Automated validators can ease the burden of manually checking the validity of inputs for human testers, although input validity is a concept difficult to formalise and, thus, automate. In this paper, we investigate to what extent TIGs can generate valid inputs, according to both automated and human validators. We conduct a large empirical study, involving 2 different automated validators, 220 human assessors, 5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

testingautomated-usi/tig-validity-icse23
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Software Engineering Research

MethodsTest