LicenseGPT: A Fine-tuned Foundation Model for Publicly Available Dataset License Compliance
Jingwen Tan, Gopi Krishnan Rajbahadur, Zi Li, Xiangfu Song, Jianshan Lin, Dan Li, Zibin Zheng, Ahmed E. Hassan

TL;DR
LicenseGPT is a fine-tuned foundation model designed to improve dataset license compliance analysis, significantly enhancing accuracy and efficiency for legal professionals handling publicly available datasets.
Contribution
The paper introduces LicenseGPT, a specialized legal foundation model that outperforms existing models and drastically reduces analysis time for dataset license compliance.
Findings
LicenseGPT achieves a Prediction Agreement of 64.30%, surpassing existing legal FMs.
It reduces analysis time for license review by 94.44%, from 108 to 6 seconds.
Legal professionals find LicenseGPT a valuable tool that increases efficiency without sacrificing accuracy.
Abstract
Dataset license compliance is a critical yet complex aspect of developing commercial AI products, particularly with the increasing use of publicly available datasets. Ambiguities in dataset licenses pose significant legal risks, making it challenging even for software IP lawyers to accurately interpret rights and obligations. In this paper, we introduce LicenseGPT, a fine-tuned foundation model (FM) specifically designed for dataset license compliance analysis. We first evaluate existing legal FMs (i.e., FMs specialized in understanding and processing legal texts) and find that the best-performing model achieves a Prediction Agreement (PA) of only 43.75%. LicenseGPT, fine-tuned on a curated dataset of 500 licenses annotated by legal experts, significantly improves PA to 64.30%, outperforming both legal and general-purpose FMs. Through an A/B test and user study with software IP lawyers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
