Beyond Pairwise Comparisons: A Distributional Test of Distinctiveness for Machine-Generated Works in Intellectual Property Law
Anirban Mukherjee, Hannah Hanwen Chang

TL;DR
This paper introduces a distributional two-sample test using maximum mean discrepancy on semantic embeddings to assess the distinctiveness of machine-generated works, addressing the limitations of pairwise comparisons especially for unbounded generative processes.
Contribution
It proposes a novel, task-agnostic, and sample-efficient distributional testing framework for evaluating the distinctiveness of AI-generated outputs across various domains.
Findings
Detects distributional differences with as few as 5-10 images or 7-20 texts.
Reveals that AI outputs are statistically distinguishable from human-created works.
Contradicts the idea that generative models merely regurgitate training data.
Abstract
Key doctrines, including novelty (patent), originality (copyright), and distinctiveness (trademark), turn on a shared empirical question: whether a body of work is meaningfully distinct from a relevant reference class. Yet analyses typically operationalize this set-level inquiry using item-level evidence: pairwise comparisons among exemplars. That unit-of-analysis mismatch may be manageable for finite corpora of human-created works, where it can be bridged by ad hoc aggregations. But it becomes acute for machine-generated works, where the object of evaluation is not a fixed set of works but a generative process with an effectively unbounded output space. We propose a distributional alternative: a two-sample test based on maximum mean discrepancy computed on semantic embeddings to determine if two creative processes-whether human or machine-produce statistically distinguishable output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property · Ethics and Social Impacts of AI · Copyright and Intellectual Property
