TL;DR
TraceSim introduces a novel machine learning-based method combining TF-IDF and Levenshtein distance to improve stack trace similarity measurement, enhancing automated crash report triaging accuracy in large-scale software systems.
Contribution
It presents TraceSim, a new approach that integrates multiple techniques for better stack trace similarity assessment, implemented in an industrial triaging system.
Findings
Significantly outperforms baseline methods in accuracy
Effective in large-scale crash report triaging
Improves automation of bug report grouping
Abstract
Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to the big volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
