Stack Trace-Based Crash Deduplication with Transformer Adaptation

Md Afif Al Mamun; Gias Uddin; Lan Xia; Longyu Zhang

arXiv:2508.19449·cs.SE·August 28, 2025

Stack Trace-Based Crash Deduplication with Transformer Adaptation

Md Afif Al Mamun, Gias Uddin, Lan Xia, Longyu Zhang

PDF

TL;DR

This paper introduces dedupT, a transformer-based method that models stack traces holistically to improve crash report deduplication, outperforming traditional and deep learning approaches in real-world datasets.

Contribution

The paper presents dedupT, a novel transformer-based approach that adapts pretrained language models for effective stack trace deduplication, capturing structural relationships better than prior methods.

Findings

01

dedupT outperforms existing DL and traditional methods in duplicate ranking.

02

Significant reduction in manual triage effort achieved.

03

Improved MRR and ROC-AUC metrics on four public datasets.

Abstract

Automated crash reporting systems generate large volumes of duplicate reports, overwhelming issue-tracking systems and increasing developer workload. Traditional stack trace-based deduplication methods, relying on string similarity, rule-based heuristics, or deep learning (DL) models, often fail to capture the contextual and structural relationships within stack traces. We propose dedupT, a transformer-based approach that models stack traces holistically rather than as isolated frames. dedupT first adapts a pretrained language model (PLM) to stack traces, then uses its embeddings to train a fully-connected network (FCN) to rank duplicate crashes effectively. Extensive experiments on real-world datasets show that dedupT outperforms existing DL and traditional methods (e.g., sequence alignment and information retrieval techniques) in both duplicate ranking and unique crash detection,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.