GPTrace: Effective Crash Deduplication Using LLM Embeddings
Patrick Herter, Vincent Ahlrichs, Ridvan A\c{c}ilan, Julian Horsch

TL;DR
GPTrace introduces a novel crash deduplication method using large language model embeddings to improve accuracy over traditional and state-of-the-art techniques, significantly reducing manual effort in fuzzing analysis.
Contribution
This work presents GPTrace, a new workflow that leverages LLM embeddings for crash similarity assessment, outperforming existing stack trace and complex deduplication methods.
Findings
GPTrace achieves higher deduplication accuracy than traditional methods.
The approach effectively clusters over 300,000 crash inputs across multiple targets.
LLM-based embeddings provide flexible and improved crash similarity evaluation.
Abstract
Fuzzing is a highly effective method for uncovering software vulnerabilities, but analyzing the resulting data typically requires substantial manual effort. This is amplified by the fact that fuzzing campaigns often find a large number of crashing inputs, many of which share the same underlying bug. Crash deduplication is the task of finding such duplicate crashing inputs and thereby reducing the data that needs to be examined. Many existing deduplication approaches rely on comparing stack traces or other information that is collected when a program crashes. Although various metrics for measuring the similarity of such pieces of information have been proposed, many do not yield satisfactory deduplication results. In this work, we present GPTrace, a deduplication workflow that leverages a large language model to evaluate the similarity of various data sources associated with crashes by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Web Application Security Vulnerabilities · Software Engineering Research
