TL;DR
This paper develops advanced classification methods to accurately distinguish between human-written and ChatGPT-generated code, achieving up to 98% accuracy and providing insights into model calibration and explainability.
Contribution
It introduces a novel combination of embedding features and supervised learning algorithms for high-accuracy code source detection, along with interpretability tools.
Findings
Achieved 98% accuracy with combined embedding and deep learning models.
Models are well calibrated, enhancing trust in predictions.
Humans perform no better than random guessing on this task.
Abstract
In recent times, large language models (LLMs) have made significant strides in generating computer code, blurring the lines between code created by humans and code produced by artificial intelligence (AI). As these technologies evolve rapidly, it is crucial to explore how they influence code generation, especially given the risk of misuse in areas like higher education. This paper explores this issue by using advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT, a type of LLM. We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms - including Deep Neural Networks, Random Forests, and Extreme Gradient Boosting - to achieve this differentiation with an impressive accuracy of 98%. For the successful combinations, we also examine their model calibration, showing that some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
