Automatic Detection of LLM-Generated Code: A Comparative Case Study of Contemporary Models Across Function and Class Granularities

Musfiqur Rahman; SayedHassan Khatoonabadi; Ahmad Abdellatif; Emad Shihab

arXiv:2409.01382·cs.SE·December 23, 2025·2 cites

Automatic Detection of LLM-Generated Code: A Comparative Case Study of Contemporary Models Across Function and Class Granularities

Musfiqur Rahman, SayedHassan Khatoonabadi, Ahmad Abdellatif, Emad Shihab

PDF

Open Access

TL;DR

This study compares the effectiveness of various LLMs in generating code and evaluates detection methods across different granularities, revealing significant structural differences and the need for diverse, model-aware detection strategies.

Contribution

It provides a systematic cross-model validation of code detectors and uncovers granularity-dependent detection signatures, highlighting the limitations of existing methods.

Findings

01

Granularity effects dominate model differences by a factor of 8.6.

02

Detectability varies significantly across models and granularities.

03

Comment-to-Code Ratio is a universal but variably effective discriminator.

Abstract

The adoption of Large Language Models (LLMs) for code generation risks incorporating vulnerable code into software systems. Existing detectors face two critical limitations: a lack of systematic cross-model validation and opaque "black box" operation. We address this through a comparative study of code generated by four distinct LLMs: GPT-3.5, Claude 3 Haiku, Claude Haiku 4.5, and GPT-OSS. Analyzing 14,485 Python functions and 11,913 classes from the CodeSearchNet dataset, we generated corresponding code with all four LLMs. Using interpretable software metrics, we trained CatBoost classifiers for each configuration. Our analysis reveals that granularity effects dominate model differences by a factor of 8.6, with negligible feature overlap, indicating that function-level and class-level detection rely on fundamentally disjoint structural signatures. We discover critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques