Stack Trace Deduplication: Faster, More Accurately, and in More   Realistic Scenarios

Egor Shibaev; Denis Sushentsev; Yaroslav Golubev; Aleksandr Khvorov

arXiv:2412.14802·cs.SE·December 20, 2024

Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios

Egor Shibaev, Denis Sushentsev, Yaroslav Golubev, Aleksandr Khvorov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new stack trace deduplication model, a large industry dataset, and a comprehensive evaluation, demonstrating improved accuracy and speed in real-world scenarios for large-scale software error analysis.

Contribution

The work presents a novel two-part model, a new industry dataset, and a realistic evaluation framework for stack trace deduplication at scale.

Findings

01

Outperforms existing models on open-source and industry datasets.

02

Balances accuracy with operational speed effectively.

03

Handles large-scale, real-world stack trace data efficiently.

Abstract

In large-scale software systems, there are often no fully-fledged bug reports with human-written descriptions when an error occurs. In this case, developers rely on stack traces, i.e., series of function calls that led to the error. Since there can be tens and hundreds of thousands of them describing the same issue from different users, automatic deduplication into categories is necessary to allow for processing. Recent works have proposed powerful deep learning-based approaches for this, but they are evaluated and compared in isolation from real-life workflows, and it is not clear whether they will actually work well at scale. To overcome this gap, this work presents three main contributions: a novel model, an industry-based dataset, and a multi-faceted evaluation. Our model consists of two parts - (1) an embedding model with byte-pair encoding and approximate nearest neighbor search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jetbrains-research/stack-trace-deduplication
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies