Distinguishing LLM-generated from Human-written Code by Contrastive Learning
Xiaodan Xu, Chao Ni, Xinrong Guo, Shaoxuan Liu, Xiaoya Wang, Kui Liu,, Xiaohu Yang

TL;DR
This paper introduces CodeGPTSensor, a contrastive learning-based detector that effectively distinguishes ChatGPT-generated code from human-written code, addressing a gap in existing detection methods focused on natural language rather than programming code.
Contribution
The paper presents a novel detector for ChatGPT-generated code using contrastive learning and a semantic encoder, supported by a large-scale dataset for evaluation.
Findings
CodeGPTSensor outperforms baseline detectors in accuracy.
Large-scale HMCorp dataset enables comprehensive analysis.
Contrastive learning effectively captures differences between human and machine code.
Abstract
Large language models (LLMs), such as ChatGPT released by OpenAI, have attracted significant attention from both industry and academia due to their demonstrated ability to generate high-quality content for various tasks. Despite the impressive capabilities of LLMs, there are growing concerns regarding their potential risks in various fields, such as news, education, and software engineering. Recently, several commercial and open-source LLM-generated content detectors have been proposed, which, however, are primarily designed for detecting natural language content without considering the specific characteristics of program code. This paper aims to fill this gap by proposing a novel ChatGPT-generated code detector, CodeGPTSensor, based on a contrastive learning framework and a semantic encoder built with UniXcoder. To assess the effectiveness of CodeGPTSensor on differentiating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
