Identifying Concurrency Bug Reports via Linguistic Patterns
Shuai Shao, Lu Xiao, Tingting Yu

TL;DR
This paper introduces a linguistic-pattern-based framework for automatically identifying concurrency bug reports, leveraging a taxonomy of patterns and fine-tuned language models to improve detection accuracy in large-scale open-source projects.
Contribution
It presents a comprehensive taxonomy of linguistic patterns, a novel fine-tuning strategy for language models, and a labeled dataset for concurrency bug report classification.
Findings
Fine-tuned PLMs with linguistic patterns achieve 91-93% precision.
Linguistic patterns significantly enhance bug report classification accuracy.
The approach maintains high precision on new, unseen data.
Abstract
With the growing ubiquity of multi-core architectures, concurrent systems have become essential but increasingly prone to complex issues such as data races and deadlocks. While modern issue-tracking systems facilitate the reporting of such problems, labeling concurrency-related bug reports remains a labor-intensive and error-prone task. This paper presents a linguistic-pattern-based framework for automatically identifying concurrency bug reports. We derive 58 distinct linguistic patterns from 730 manually labeled concurrency bug reports, organized across four levels: word-level (keywords), phrase-level (n-grams), sentence-level (semantic), and bug report-level (contextual). To assess their effectiveness, we evaluate four complementary approaches-matching, learning, prompt-based, and fine-tuning-spanning traditional machine learning, large language models (LLMs), and pre-trained language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
