On the Possibility of Breaking Copyleft Licenses When Reusing Code   Generated by ChatGPT

Gaia Colombo; Leonardo Mariani; Daniela Micucci; Oliviero; Riganelli

arXiv:2502.05023·cs.SE·February 10, 2025

On the Possibility of Breaking Copyleft Licenses When Reusing Code Generated by ChatGPT

Gaia Colombo, Leonardo Mariani, Daniela Micucci, Oliviero, Riganelli

PDF

Open Access

TL;DR

This study investigates how AI code assistants like ChatGPT may inadvertently reproduce copyleft-licensed code, risking license violations, and explores how different generation settings affect this risk.

Contribution

It provides the first large-scale analysis of copyleft license reproduction in ChatGPT-generated code, highlighting factors influencing this phenomenon.

Findings

01

Larger context increases copyleft code reproduction

02

Higher temperature settings reduce the risk

03

Over 70,000 method implementations analyzed

Abstract

AI assistants can help developers by recommending code to be included in their implementations (e.g., suggesting the implementation of a method from its signature). Although useful, these recommendations may mirror copyleft code available in public repositories, exposing developers to the risk of reusing code that they are allowed to reuse only under certain constraints (e.g., a specific license for the derivative software). This paper presents a large-scale study about the frequency and magnitude of this phenomenon in ChatGPT. In particular, we generate more than 70,000 method implementations using a range of configurations and prompts, revealing that a larger context increases the likelihood of reproducing copyleft code, but higher temperature settings can mitigate this issue.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinTech, Crowdfunding, Digital Finance · Scientific Computing and Data Management · Artificial Intelligence in Healthcare and Education