LiCoEval: Evaluating LLMs on License Compliance in Code Generation

Weiwei Xu; Kai Gao; Hao He; Minghui Zhou

arXiv:2408.02487·cs.SE·February 26, 2025

LiCoEval: Evaluating LLMs on License Compliance in Code Generation

Weiwei Xu, Kai Gao, Hao He, Minghui Zhou

PDF

Open Access 2 Repos

TL;DR

This paper introduces LiCoEval, a benchmark for assessing LLMs' ability to provide accurate license information for generated code, revealing significant shortcomings in current models' compliance with open-source licenses.

Contribution

It establishes a novel benchmark and empirical standard for license similarity, evaluating 14 LLMs' license compliance capabilities in code generation.

Findings

01

Top LLMs produce 0.88% to 2.01% code with striking similarity to open-source code.

02

Most LLMs fail to provide correct license information, especially for copyleft licenses.

03

The study highlights the urgent need to improve LLM license compliance in code generation.

Abstract

Recent advances in Large Language Models (LLMs) have revolutionized code generation, leading to widespread adoption of AI coding tools by developers. However, LLMs can generate license-protected code without providing the necessary license information, leading to potential intellectual property violations during software production. This paper addresses the critical, yet underexplored, issue of license compliance in LLM-generated code by establishing a benchmark to evaluate the ability of LLMs to provide accurate license information for their generated code. To establish this benchmark, we conduct an empirical study to identify a reasonable standard for "striking similarity" that excludes the possibility of independent creation, indicating a copy relationship between the LLM output and certain open-source code. Based on this standard, we propose LiCoEval, to evaluate the license…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security