Calculating Originality of LLM Assisted Source Code
Shipra Sharma, Balwinder Sodhi

TL;DR
This paper introduces a neural network-based tool to assess the extent of LLM-generated content in source code, aiding educators in identifying AI-assisted code contributions.
Contribution
It presents a novel approach using neural networks and complexity measures to quantify LLM contribution in source code, addressing a key challenge in code originality detection.
Findings
Promising initial results on moderate-sized code samples
Effective differentiation between human and LLM-generated code
Potential for aiding academic integrity and code evaluation
Abstract
The ease of using a Large Language Model (LLM) to answer a wide variety of queries and their high availability has resulted in LLMs getting integrated into various applications. LLM-based recommenders are now routinely used by students as well as professional software programmers for code generation and testing. Though LLM-based technology has proven useful, its unethical and unattributed use by students and professionals is a growing cause of concern. As such, there is a need for tools and technologies which may assist teachers and other evaluators in identifying whether any portion of a source code is LLM generated. In this paper, we propose a neural network-based tool that instructors can use to determine the original effort (and LLM's contribution) put by students in writing source codes. Our tool is motivated by minimum description length measures like Kolmogorov complexity. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
