Calculating Originality of LLM Assisted Source Code

Shipra Sharma; Balwinder Sodhi

arXiv:2307.04492·cs.SE·July 11, 2023

Calculating Originality of LLM Assisted Source Code

Shipra Sharma, Balwinder Sodhi

PDF

Open Access

TL;DR

This paper introduces a neural network-based tool to assess the extent of LLM-generated content in source code, aiding educators in identifying AI-assisted code contributions.

Contribution

It presents a novel approach using neural networks and complexity measures to quantify LLM contribution in source code, addressing a key challenge in code originality detection.

Findings

01

Promising initial results on moderate-sized code samples

02

Effective differentiation between human and LLM-generated code

03

Potential for aiding academic integrity and code evaluation

Abstract

The ease of using a Large Language Model (LLM) to answer a wide variety of queries and their high availability has resulted in LLMs getting integrated into various applications. LLM-based recommenders are now routinely used by students as well as professional software programmers for code generation and testing. Though LLM-based technology has proven useful, its unethical and unattributed use by students and professionals is a growing cause of concern. As such, there is a need for tools and technologies which may assist teachers and other evaluators in identifying whether any portion of a source code is LLM generated. In this paper, we propose a neural network-based tool that instructors can use to determine the original effort (and LLM's contribution) put by students in writing source codes. Our tool is motivated by minimum description length measures like Kolmogorov complexity. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research