LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Mahir Labib Dihan; Abir Muhtasim

arXiv:2604.16058·cs.SE·April 20, 2026

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Mahir Labib Dihan, Abir Muhtasim

PDF

1 Repo

TL;DR

LLMSniffer is a detection framework that uses GraphCodeBERT and supervised contrastive learning to distinguish AI-generated code from human-written code, improving accuracy on benchmark datasets.

Contribution

The paper introduces a novel contrastive fine-tuning approach for GraphCodeBERT to enhance detection of LLM-generated code, with released models and tools.

Findings

01

Accuracy improved from 70% to 78% on GPTSniffer

02

Accuracy increased from 91% to 94.65% on Whodunit

03

Contrastive fine-tuning produces well-separated embeddings

Abstract

The rapid proliferation of Large Language Models (LLMs) in software development has made distinguishing AI-generated code from human-written code a critical challenge with implications for academic integrity, code quality assurance, and software security. We present LLMSniffer, a detection framework that fine-tunes GraphCodeBERT using a two-stage supervised contrastive learning pipeline augmented with comment removal preprocessing and an MLP classifier. Evaluated on two benchmark datasets - GPTSniffer and Whodunit - LLMSniffer achieves substantial improvements over prior baselines: accuracy increases from 70% to 78% on GPTSniffer (F1: 68% to 78%) and from 91% to 94.65% on Whodunit (F1: 91% to 94.64%). t-SNE visualizations confirm that contrastive fine-tuning yields well-separated, compact embeddings. We release our model checkpoints, datasets, codes and a live interactive demo to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.