ChatGPT Code Detection: Techniques for Uncovering the Source of Code

Marc Oedingen; Raphael C. Engelhardt; Robin Denz; Maximilian Hammer,; Wolfgang Konen

arXiv:2405.15512·cs.LG·July 4, 2024

ChatGPT Code Detection: Techniques for Uncovering the Source of Code

Marc Oedingen, Raphael C. Engelhardt, Robin Denz, Maximilian Hammer,, Wolfgang Konen

PDF

1 Repo

TL;DR

This paper develops advanced classification methods to accurately distinguish between human-written and ChatGPT-generated code, achieving up to 98% accuracy and providing insights into model calibration and explainability.

Contribution

It introduces a novel combination of embedding features and supervised learning algorithms for high-accuracy code source detection, along with interpretability tools.

Findings

01

Achieved 98% accuracy with combined embedding and deep learning models.

02

Models are well calibrated, enhancing trust in predictions.

03

Humans perform no better than random guessing on this task.

Abstract

In recent times, large language models (LLMs) have made significant strides in generating computer code, blurring the lines between code created by humans and code produced by artificial intelligence (AI). As these technologies evolve rapidly, it is crucial to explore how they influence code generation, especially given the risk of misuse in areas like higher education. This paper explores this issue by using advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT, a type of LLM. We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms - including Deep Neural Networks, Random Forests, and Extreme Gradient Boosting - to achieve this differentiation with an impressive accuracy of 98%. For the successful combinations, we also examine their model calibration, showing that some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MarcOedingen/ChatGPT-Code-Detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.