One Size Does Not Fit All: Investigating Efficacy of Perplexity in Detecting LLM-Generated Code

Jinwei Xu; He Zhang; Yanjing Yang; Lanxin Yang; Zeru Cheng; Jun Lyu; Bohan Liu; Xin Zhou; Alberto Bacchelli; Yin Kia Chiam; Thiam Kian Chiew

arXiv:2412.16525·cs.SE·July 16, 2025

One Size Does Not Fit All: Investigating Efficacy of Perplexity in Detecting LLM-Generated Code

Jinwei Xu, He Zhang, Yanjing Yang, Lanxin Yang, Zeru Cheng, Jun Lyu, Bohan Liu, Xin Zhou, Alberto Bacchelli, Yin Kia Chiam, Thiam Kian Chiew

PDF

TL;DR

This paper evaluates the effectiveness of the perplexity-based method for detecting large language model-generated code, revealing its strengths in generalization but limitations in accuracy and speed across various realistic scenarios.

Contribution

It provides the first large-scale analysis of perplexity-based detection, comparing it with other methods across multiple criteria and offering practical recommendations for improvement.

Findings

01

PERPLEXITY has the best generalization capability.

02

PERPLEXITY shows limited detection accuracy.

03

PERPLEXITY is unsuitable for high-level languages.

Abstract

Large language model-generated code (LLMgCode) has become increasingly common in software development. So far LLMgCode has more quality issues than human-authored code (HaCode). It is common for LLMgCode to mix with HaCode in a code change, while the change is signed by only human developers, without being carefully examined. Many automated methods have been proposed to detect LLMgCode from HaCode, in which the perplexity-based method (PERPLEXITY for short) is the state-of-the-art method. However, the efficacy evaluation of PERPLEXITY has focused on detection accuracy. Yet it is unclear whether PERPLEXITY is good enough in a wider range of realistic evaluation settings. To this end, we carry out a family of experiments to compare PERPLEXITY against feature- and pre-training-based methods from three perspectives: detection accuracy, detection speed, and generalization capability. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.