Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development

David Szczecina; Senan Gaffori; Edmond Li

arXiv:2511.20623·cs.AI·March 20, 2026

Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development

David Szczecina, Senan Gaffori, Edmond Li

PDF

Open Access

TL;DR

This paper presents an open-source, scalable, and user-friendly copyright detection platform for Large Language Models, enhancing transparency and ethical compliance in AI development by efficiently verifying training data inclusion.

Contribution

It introduces a novel, accessible copyright detection tool that improves similarity detection and reduces computational costs, supporting responsible AI practices.

Findings

01

Reduces computational overhead by 10-30%

02

Provides an intuitive interface for content creators

03

Enhances transparency in AI training data verification

Abstract

The widespread use of Large Language Models (LLMs) raises critical concerns regarding the unauthorized inclusion of copyrighted content in training data. Existing detection frameworks, such as DE-COP, are computationally intensive, and largely inaccessible to independent creators. As legal scrutiny increases, there is a pressing need for a scalable, transparent, and user-friendly solution. This paper introduce an open-source copyright detection platform that enables content creators to verify whether their work was used in LLM training datasets. Our approach enhances existing methodologies by facilitating ease of use, improving similarity detection, optimizing dataset validation, and reducing computational overhead by 10-30% with efficient API calls. With an intuitive user interface and scalable backend, this framework contributes to increasing transparency in AI development and ethical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Authorship Attribution and Profiling · Explainable Artificial Intelligence (XAI)