Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces
Neetha Jambigi, Bartosz Bogacz, Moritz Mueller, Thomas Bach, Michael, Felderer

TL;DR
This paper introduces a novel fault localization method using fine-tuned large language models trained on synthetic crash data, achieving significantly higher accuracy than baselines in identifying root causes from stack traces alone.
Contribution
The paper presents a new approach to fault localization that fine-tunes LLMs with synthetic crash data, enabling accurate root cause prediction solely from stack traces.
Findings
Achieved 66.9% accuracy on SAP HANA crashes, outperforming baselines.
Generalized the approach to open-source databases with 63% and 74% accuracy.
Fine-tuning outperformed prompting non-finetuned LLMs across datasets.
Abstract
Abrupt and unexpected terminations of software are termed as software crashes. They can be challenging to analyze. Finding the root cause requires extensive manual effort and expertise to connect information sources like stack traces, source code, and logs. Typical approaches to fault localization require either test failures or source code. Crashes occurring in production environments, such as that of SAP HANA, provide solely crash logs and stack traces. We present a novel approach to localize faults based only on the stack trace information and no additional runtime information, by fine-tuning large language models (LLMs). We address complex cases where the root cause of a crash differs from the technical cause, and is not located in the innermost frame of the stack trace. As the number of historic crashes is insufficient to fine-tune LLMs, we augment our dataset by leveraging code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Genomics and Phylogenetic Studies · Software Engineering Research
