A novel approach to measuring the scope of patent claims based on   probabilities obtained from (large) language models

S\'ebastien Ragot

arXiv:2309.10003·cs.CL·November 14, 2024·1 cites

A novel approach to measuring the scope of patent claims based on probabilities obtained from (large) language models

S\'ebastien Ragot

PDF

Open Access

TL;DR

This paper introduces a method to quantify patent claim scope using probabilities from language models, where less probable, more surprising concepts indicate narrower claims, validated across various models including large language models like GPT-2 and davinci-002.

Contribution

It presents a novel information-theoretic approach to measure patent claim scope leveraging language model probabilities, demonstrating the effectiveness of large models over simpler frequency-based methods.

Findings

01

LLMs outperform frequency-based models in scope measurement

02

Character count is a more reliable indicator than word count

03

Simplest models relate scope to the reciprocal of claim length

Abstract

This work proposes to measure the scope of a patent claim as the reciprocal of self-information contained in this claim. Self-information is calculated based on a probability of occurrence of the claim, where this probability is obtained from a language model. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to define the claim, the narrower its scope. Seven language models are considered, ranging from simplest models (each word or character has an identical probability) to intermediate models (based on average word or character frequencies), to large language models (LLMs) such as GPT2 and davinci-002. Remarkably, when using the simplest language models to compute the probabilities, the scope becomes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntellectual Property and Patents · Computational Drug Discovery Methods

MethodsHigh-Order Consensuses