TraceSim: A Method for Calculating Stack Trace Similarity

Roman Vasiliev; Dmitrij Koznov; George Chernishev; Aleksandr Khvorov,; Dmitry Luciv; Nikita Povarov

arXiv:2009.12590·cs.SE·September 29, 2020

TraceSim: A Method for Calculating Stack Trace Similarity

Roman Vasiliev, Dmitrij Koznov, George Chernishev, Aleksandr Khvorov,, Dmitry Luciv, Nikita Povarov

PDF

2 Repos

TL;DR

TraceSim introduces a novel machine learning-based method combining TF-IDF and Levenshtein distance to improve stack trace similarity measurement, enhancing automated crash report triaging accuracy in large-scale software systems.

Contribution

It presents TraceSim, a new approach that integrates multiple techniques for better stack trace similarity assessment, implemented in an industrial triaging system.

Findings

01

Significantly outperforms baseline methods in accuracy

02

Effective in large-scale crash report triaging

03

Improves automation of bug report grouping

Abstract

Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to the big volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.