Principal Context-aware Diffusion Guided Data Augmentation for Fault Localization
Shihao Fu, Yan Lei

TL;DR
This paper introduces PCD-DAug, a novel data augmentation method that synthesizes failing test cases using principal context and diffusion models to address class imbalance in fault localization, significantly improving its effectiveness.
Contribution
It proposes a new approach combining program slicing, principal component analysis, and diffusion models to generate failing test cases for fault localization.
Findings
Significant improvements in fault localization accuracy across six approaches.
Average top-1, top-3, and top-5 improvements of over 200%.
Effective handling of class imbalance in test cases.
Abstract
Test cases are indispensable for conducting effective fault localization (FL). However, test cases in practice are severely class imbalanced, i.e. the number of failing test cases (i.e. minority class) is much less than that of passing ones (i.e. majority class). The severe class imbalance between failing and passing test cases have hindered the FL effectiveness. To address this issue, we propose PCD-DAug: a Principal Context-aware Diffusion guided Data Augmentation approach that generate synthesized failing test cases for improving FL. PCD-DAug first combines program slicing with principal component analysis to construct a principal context that shows how a set of statements influences the faulty output via statistical program dependencies. Then, PCD-DAug devises a conditional diffusion model to learn from principle contexts for generating synthesized failing test cases and acquiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Software System Performance and Reliability
