DeepFD: Automated Fault Diagnosis and Localization for Deep Learning   Programs

Jialun Cao; Meiziniu Li; Xiao Chen; Ming Wen; Yongqiang; Tian; Bo Wu; Shing-Chi Cheung

arXiv:2205.01938·cs.SE·May 5, 2022

DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs

Jialun Cao, Meiziniu Li, Xiao Chen, Ming Wen, Yongqiang, Tian, Bo Wu, Shing-Chi Cheung

PDF

1 Repo

TL;DR

DeepFD is a learning-based framework that automatically diagnoses and localizes faults in deep learning programs by analyzing runtime features during training, significantly improving accuracy over existing rule-based methods.

Contribution

DeepFD introduces a novel learning approach for fault diagnosis and localization in DL programs, moving beyond neuron-focused and rule-based methods to identify root causes effectively.

Findings

01

Correctly diagnoses 52% of faulty DL programs

02

Locates faults in 42% of faulty programs

03

Outperforms state-of-the-art methods in accuracy

Abstract

As Deep Learning (DL) systems are widely deployed for mission-critical applications, debugging such systems becomes essential. Most existing works identify and repair suspicious neurons on the trained Deep Neural Network (DNN), which, unfortunately, might be a detour. Specifically, several existing studies have reported that many unsatisfactory behaviors are actually originated from the faults residing in DL programs. Besides, locating faulty neurons is not actionable for developers, while locating the faulty statements in DL programs can provide developers with more useful information for debugging. Though a few recent studies were proposed to pinpoint the faulty statements in DL programs or the training settings (e.g. too large learning rate), they were mainly designed based on predefined rules, leading to many false alarms or false negatives, especially when the faults are beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arabelatso/deepfd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRepair