Large-Scale Text Analysis Using Generative Language Models: A Case Study   in Discovering Public Value Expressions in AI Patents

Sergio Pelaez; Gaurav Verma; Barbara Ribeiro; Philip Shapira

arXiv:2305.10383·cs.CL·December 31, 2024·5 cites

Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents

Sergio Pelaez, Gaurav Verma, Barbara Ribeiro, Philip Shapira

PDF

Open Access

TL;DR

This paper demonstrates how GPT-4 can be used to generate labels and rationales for large-scale text analysis of AI patents, effectively identifying public value expressions with high accuracy and efficiency.

Contribution

It introduces a novel framework leveraging GPT-4 for label and rationale generation in large-scale patent text analysis, improving accuracy and reducing costs.

Findings

01

GPT-4 produces accurate, diverse, and faithful labels and rationales.

02

The approach enables high F1 scores in classifying public value expressions.

03

Using GPT-4 labels to train classifiers yields high predictive performance.

Abstract

Labeling data is essential for training text classifiers but is often difficult to accomplish accurately, especially for complex and abstract concepts. Seeking an improved method, this paper employs a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. We apply this approach to the task of discovering public value expressions in US AI patents. We collect a database comprising 154,934 patent documents using an advanced Boolean query submitted to InnovationQ+. The results are merged with full patent text from the USPTO, resulting in 5.4 million sentences. We design a framework for identifying and labeling public value expressions in these AI patent sentences. A prompt for GPT-4 is developed which includes definitions, guidelines, examples, and rationales for text classification. We evaluate the quality of the labels and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dense Connections · Adam · Residual Connection · Absolute Position Encodings · Softmax · Layer Normalization