Supporting Cross-language Cross-project Bug Localization Using   Pre-trained Language Models

Mahinthan Chandramohan; Dai Quoc Nguyen; Padmanabhan Krishnan; and; Jovan Jancic

arXiv:2407.02732·cs.SE·July 4, 2024

Supporting Cross-language Cross-project Bug Localization Using Pre-trained Language Models

Mahinthan Chandramohan, Dai Quoc Nguyen, Padmanabhan Krishnan, and, Jovan Jancic

PDF

Open Access

TL;DR

This paper introduces a pre-trained language model-based bug localization method that is highly generalizable across projects and languages, improves accuracy by combining code and commit message analysis, and is optimized for practical deployment.

Contribution

It presents a novel PLM-based bug localization approach using contrastive learning, a new ranking method combining commit messages and code, and a knowledge distillation technique for model size reduction.

Findings

01

Achieves higher bug localization accuracy than existing methods.

02

Demonstrates strong generalizability across unseen projects and languages.

03

Offers a CPU-compatible, efficient model suitable for real-world deployment.

Abstract

Automatically locating a bug within a large codebase remains a significant challenge for developers. Existing techniques often struggle with generalizability and deployment due to their reliance on application-specific data and large model sizes. This paper proposes a novel pre-trained language model (PLM) based technique for bug localization that transcends project and language boundaries. Our approach leverages contrastive learning to enhance the representation of bug reports and source code. It then utilizes a novel ranking approach that combines commit messages and code segments. Additionally, we introduce a knowledge distillation technique that reduces model size for practical deployment without compromising performance. This paper presents several key benefits. By incorporating code segment and commit message analysis alongside traditional file-level examination, our technique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Natural Language Processing Techniques

MethodsContrastive Learning · Knowledge Distillation