Aggregation of Stack Trace Similarities for Crash Report Deduplication
Nikolay Karasov, Aleksandr Khvorov, Roman Vasiliev, Yaroslav Golubev,, Timofey Bryksin

TL;DR
This paper introduces a novel method for deduplicating crash reports by aggregating similarities and timestamps of stack traces, significantly improving accuracy over existing solutions in real-world software development environments.
Contribution
It proposes a new aggregation-based approach for crash report deduplication that outperforms state-of-the-art methods and analyzes feature contributions for further development.
Findings
Improved recall rate by 15 percentage points on NetBeans dataset
Enhanced deduplication accuracy by 8 percentage points on JetBrains data
Aggregation approach outperforms simpler k-NN methods
Abstract
The automatic collection of stack traces in bug tracking systems is an integral part of many software projects and their maintenance. However, such reports often contain a lot of duplicates, and the problem of de-duplicating them into groups arises. In this paper, we propose a new approach to solve the deduplication task and report on its use on the real-world data from JetBrains, a leading developer of IDEs and other software. Unlike most of the existing methods, which assign the incoming stack trace to a particular group in which a single most similar stack trace is located, we use the information about all the calculated similarities to the group, as well as the information about the timestamp of the stack traces. This approach to aggregating all available information shows significantly better results compared to existing solutions. The aggregation improved the results over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
