Supporting Cross-language Cross-project Bug Localization Using Pre-trained Language Models
Mahinthan Chandramohan, Dai Quoc Nguyen, Padmanabhan Krishnan, and, Jovan Jancic

TL;DR
This paper introduces a pre-trained language model-based bug localization method that is highly generalizable across projects and languages, improves accuracy by combining code and commit message analysis, and is optimized for practical deployment.
Contribution
It presents a novel PLM-based bug localization approach using contrastive learning, a new ranking method combining commit messages and code, and a knowledge distillation technique for model size reduction.
Findings
Achieves higher bug localization accuracy than existing methods.
Demonstrates strong generalizability across unseen projects and languages.
Offers a CPU-compatible, efficient model suitable for real-world deployment.
Abstract
Automatically locating a bug within a large codebase remains a significant challenge for developers. Existing techniques often struggle with generalizability and deployment due to their reliance on application-specific data and large model sizes. This paper proposes a novel pre-trained language model (PLM) based technique for bug localization that transcends project and language boundaries. Our approach leverages contrastive learning to enhance the representation of bug reports and source code. It then utilizes a novel ranking approach that combines commit messages and code segments. Additionally, we introduce a knowledge distillation technique that reduces model size for practical deployment without compromising performance. This paper presents several key benefits. By incorporating code segment and commit message analysis alongside traditional file-level examination, our technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Natural Language Processing Techniques
MethodsContrastive Learning · Knowledge Distillation
