Stack Trace-Based Crash Deduplication with Transformer Adaptation
Md Afif Al Mamun, Gias Uddin, Lan Xia, Longyu Zhang

TL;DR
This paper introduces dedupT, a transformer-based method that models stack traces holistically to improve crash report deduplication, outperforming traditional and deep learning approaches in real-world datasets.
Contribution
The paper presents dedupT, a novel transformer-based approach that adapts pretrained language models for effective stack trace deduplication, capturing structural relationships better than prior methods.
Findings
dedupT outperforms existing DL and traditional methods in duplicate ranking.
Significant reduction in manual triage effort achieved.
Improved MRR and ROC-AUC metrics on four public datasets.
Abstract
Automated crash reporting systems generate large volumes of duplicate reports, overwhelming issue-tracking systems and increasing developer workload. Traditional stack trace-based deduplication methods, relying on string similarity, rule-based heuristics, or deep learning (DL) models, often fail to capture the contextual and structural relationships within stack traces. We propose dedupT, a transformer-based approach that models stack traces holistically rather than as isolated frames. dedupT first adapts a pretrained language model (PLM) to stack traces, then uses its embeddings to train a fully-connected network (FCN) to rank duplicate crashes effectively. Extensive experiments on real-world datasets show that dedupT outperforms existing DL and traditional methods (e.g., sequence alignment and information retrieval techniques) in both duplicate ranking and unique crash detection,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
