From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models

QiHong Chen; Lianghao Jiang; Iftekhar Ahmed

arXiv:2505.05679·cs.SE·May 12, 2025

From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models

QiHong Chen, Lianghao Jiang, Iftekhar Ahmed

PDF

Open Access

TL;DR

This paper evaluates large language models for clone code detection, identifies prompt bias as a key issue, and proposes a framework that improves detection accuracy by mitigating this bias.

Contribution

It introduces a novel framework to mitigate prompt bias in LLMs for clone detection, significantly enhancing model performance.

Findings

01

Palm model achieved high F1 scores of 89.30 and 86.41 on two datasets.

02

Identified eight categories of prompt bias affecting LLM performance.

03

Proposed bias mitigation approach improved F1 scores by up to 10.81%.

Abstract

The issue of clone code has persisted in software engineering, primarily because developers often copy and paste code segments. This common practice has elevated the importance of clone code detection, garnering attention from both software engineering researchers and industry professionals. Their collective concern arises from the potential negative impacts that clone code can have on software quality. The emergence of powerful Generative Large Language Models (LLMs) like ChatGPT has exacerbated the clone code problem. These advanced models possess code generation capabilities that can inadvertently create code clones. As a result, the need to detect clone code has become more critical than ever before. In this study, we assess the suitability of LLMs for clone code detection. Our results demonstrate that the Palm model achieved a high F1 score of 89.30 for the avatar dataset and 86.41…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Engineering Techniques and Practices