An empirical study of fault localisation techniques for deep neural networks

Nargiz Humbatova; Jinhan Kim; Gunel Jahangirova; Shin Yoo; Paolo Tonella

PMC · DOI:10.1007/s10664-025-10657-7·June 10, 2025

An empirical study of fault localisation techniques for deep neural networks

Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella

PDF

Open Access

TL;DR

This paper evaluates tools that help find faults in deep neural networks and finds that using alternative patches improves their performance.

Contribution

The study introduces a benchmark with real and mutated faults and shows the impact of using alternative patches for evaluation.

Findings

01

Using a single ground truth for evaluation leads to low recall and precision in fault localisation.

02

Considering alternative patches significantly improves the performance of fault localisation tools.

03

DeepFD is the most effective tool with an average recall of 0.55 and precision of 0.37.

Abstract

With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

DNN

Figures7

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Software Testing and Debugging Techniques · Software Engineering Research