A novel approach to measuring the scope of patent claims based on probabilities obtained from (large) language models
S\'ebastien Ragot

TL;DR
This paper introduces a method to quantify patent claim scope using probabilities from language models, where less probable, more surprising concepts indicate narrower claims, validated across various models including large language models like GPT-2 and davinci-002.
Contribution
It presents a novel information-theoretic approach to measure patent claim scope leveraging language model probabilities, demonstrating the effectiveness of large models over simpler frequency-based methods.
Findings
LLMs outperform frequency-based models in scope measurement
Character count is a more reliable indicator than word count
Simplest models relate scope to the reciprocal of claim length
Abstract
This work proposes to measure the scope of a patent claim as the reciprocal of self-information contained in this claim. Self-information is calculated based on a probability of occurrence of the claim, where this probability is obtained from a language model. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to define the claim, the narrower its scope. Seven language models are considered, ranging from simplest models (each word or character has an identical probability) to intermediate models (based on average word or character frequencies), to large language models (LLMs) such as GPT2 and davinci-002. Remarkably, when using the simplest language models to compute the probabilities, the scope becomes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntellectual Property and Patents · Computational Drug Discovery Methods
MethodsHigh-Order Consensuses
