On the Possibility of Breaking Copyleft Licenses When Reusing Code Generated by ChatGPT
Gaia Colombo, Leonardo Mariani, Daniela Micucci, Oliviero, Riganelli

TL;DR
This study investigates how AI code assistants like ChatGPT may inadvertently reproduce copyleft-licensed code, risking license violations, and explores how different generation settings affect this risk.
Contribution
It provides the first large-scale analysis of copyleft license reproduction in ChatGPT-generated code, highlighting factors influencing this phenomenon.
Findings
Larger context increases copyleft code reproduction
Higher temperature settings reduce the risk
Over 70,000 method implementations analyzed
Abstract
AI assistants can help developers by recommending code to be included in their implementations (e.g., suggesting the implementation of a method from its signature). Although useful, these recommendations may mirror copyleft code available in public repositories, exposing developers to the risk of reusing code that they are allowed to reuse only under certain constraints (e.g., a specific license for the derivative software). This paper presents a large-scale study about the frequency and magnitude of this phenomenon in ChatGPT. In particular, we generate more than 70,000 method implementations using a range of configurations and prompts, revealing that a larger context increases the likelihood of reproducing copyleft code, but higher temperature settings can mitigate this issue.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinTech, Crowdfunding, Digital Finance · Scientific Computing and Data Management · Artificial Intelligence in Healthcare and Education
