Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents
Sergio Pelaez, Gaurav Verma, Barbara Ribeiro, Philip Shapira

TL;DR
This paper demonstrates how GPT-4 can be used to generate labels and rationales for large-scale text analysis of AI patents, effectively identifying public value expressions with high accuracy and efficiency.
Contribution
It introduces a novel framework leveraging GPT-4 for label and rationale generation in large-scale patent text analysis, improving accuracy and reducing costs.
Findings
GPT-4 produces accurate, diverse, and faithful labels and rationales.
The approach enables high F1 scores in classifying public value expressions.
Using GPT-4 labels to train classifiers yields high predictive performance.
Abstract
Labeling data is essential for training text classifiers but is often difficult to accomplish accurately, especially for complex and abstract concepts. Seeking an improved method, this paper employs a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. We apply this approach to the task of discovering public value expressions in US AI patents. We collect a database comprising 154,934 patent documents using an advanced Boolean query submitted to InnovationQ+. The results are merged with full patent text from the USPTO, resulting in 5.4 million sentences. We design a framework for identifying and labeling public value expressions in these AI patent sentences. A prompt for GPT-4 is developed which includes definitions, guidelines, examples, and rationales for text classification. We evaluate the quality of the labels and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dense Connections · Adam · Residual Connection · Absolute Position Encodings · Softmax · Layer Normalization
