Assessing the Code Clone Detection Capability of Large Language Models

Zixian Zhang; Takfarinas Saber

arXiv:2407.02402·cs.SE·July 3, 2024

Assessing the Code Clone Detection Capability of Large Language Models

Zixian Zhang, Takfarinas Saber

PDF

Open Access

TL;DR

This paper evaluates GPT-3.5 and GPT-4's ability to detect code clones, revealing that GPT-4 performs better but still struggles with complex clone types, especially in human-generated code.

Contribution

It provides a comparative analysis of LLMs' code clone detection performance across different clone types and datasets, highlighting current limitations.

Findings

01

GPT-4 outperforms GPT-3.5 in clone detection

02

Both models struggle with complex Type-4 clones

03

Models perform better on LLM-generated code than human-generated code

Abstract

This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity, sourced from two datasets: BigCloneBench (human-made) and GPTCloneBench (LLM-generated). Findings from the study indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types. A correlation was observed between the GPTs' accuracy at identifying code clones and code similarity, with both GPT models exhibiting low effectiveness in detecting the most complex Type-4 code clones. Additionally, GPT models demonstrate a higher performance identifying code clones in LLM-generated code compared to humans-generated code. However, they do not reach impressive accuracy. These results emphasize the imperative for ongoing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout