Small Changes, Big Trouble: Demystifying and Parsing License Variants for Incompatibility Detection in the PyPI Ecosystem
Weiwei Xu, Hengzhi Ye, Kai Gao, Minghui Zhou

TL;DR
This paper empirically studies license variants in the PyPI ecosystem, revealing their prevalence and impact on license incompatibility, and introduces novel tools for efficient license analysis and compatibility detection.
Contribution
It provides the first empirical analysis of license variants in software packaging and develops LV-Parser and LV-Compat tools for improved license variant detection and incompatibility analysis.
Findings
Textual license variations are common, but only 2% are substantively modified.
10.7% of downstream dependencies are license-incompatible due to variants.
LV-Parser achieves 93.6% accuracy and reduces costs by 30%.
Abstract
Open-source licenses establish the legal foundation for software reuse, yet license variants, including both modified standard licenses and custom-created alternatives, introduce significant compliance complexities. Despite their prevalence and potential impact, these variants are poorly understood in modern software systems, and existing tools do not account for their existence, leading to significant challenges in both effectiveness and efficiency of license analysis. To fill this knowledge gap, we conduct a comprehensive empirical study of license variants in the PyPI ecosystem. Our findings show that textual variations in licenses are common, yet only 2% involve substantive modifications. However, these license variants lead to significant compliance issues, with 10.7% of their downstream dependencies found to be license-incompatible. Inspired by our findings, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Advanced Software Engineering Methodologies
