Real Faults in Deep Learning Fault Benchmarks: How Real Are They?

Gunel Jahangirova; Nargiz Humbatova; Jinhan Kim; Shin Yoo and; Paolo Tonella

arXiv:2412.16336·cs.SE·December 24, 2024

Real Faults in Deep Learning Fault Benchmarks: How Real Are They?

Gunel Jahangirova, Nargiz Humbatova, Jinhan Kim, Shin Yoo and, Paolo Tonella

PDF

Open Access

TL;DR

This paper critically examines the realism and reproducibility of faults in DL benchmarks, revealing that only a small fraction are truly representative and reproducible, highlighting challenges in fault evaluation.

Contribution

The study provides a manual analysis of 490 faults, assessing their realism and reproducibility, and exposes limitations in current DL fault benchmarks.

Findings

01

Only 18.5% of faults meet realism criteria

02

Faults were reproducible in only 52% of cases

03

Most faults do not accurately reflect real-world issues

Abstract

As the adoption of Deep Learning (DL) systems continues to rise, an increasing number of approaches are being proposed to test these systems, localise faults within them, and repair those faults. The best attestation of effectiveness for such techniques is an evaluation that showcases their capability to detect, localise and fix real faults. To facilitate these evaluations, the research community has collected multiple benchmarks of real faults in DL systems. In this work, we perform a manual analysis of 490 faults from five different benchmarks and identify that 314 of them are eligible for our study. Our investigation focuses specifically on how well the bugs correspond to the sources they were extracted from, which fault types are represented, and whether the bugs are reproducible. Our findings indicate that only 18.5% of the faults satisfy our realism conditions. Our attempts to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Anomaly Detection Techniques and Applications · Software Engineering Research