Can Large Language Models Generate High-quality Patent Claims?

Lekang Jiang; Caiqi Zhang; Pascal A Scherz; Stephan Goetz

arXiv:2406.19465·cs.CL·May 27, 2025

Can Large Language Models Generate High-quality Patent Claims?

Lekang Jiang, Caiqi Zhang, Pascal A Scherz, Stephan Goetz

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This study evaluates the ability of large language models to generate high-quality patent claims, revealing strengths in initial claim creation and highlighting the need for domain-specific models and further refinement.

Contribution

It constructs a new dataset for patent claim generation and compares general and patent-specific LLMs, showing GPT-4's superior performance and identifying areas for improvement.

Findings

01

GPT-4 outperforms other LLMs in patent claim quality

02

Current patent-specific LLMs underperform compared to general models

03

Fine-tuning improves claim completeness and clarity

Abstract

Large language models (LLMs) have shown exceptional performance across various text generation tasks but remain under-explored in the patent domain, which offers highly structured and precise language. This paper constructs a dataset to investigate the performance of current LLMs in patent claim generation. Our results demonstrate that generating claims based on patent descriptions outperforms previous research relying on abstracts. Interestingly, current patent-specific LLMs perform much worse than state-of-the-art general LLMs, highlighting the necessity for future research on in-domain LLMs. We also find that LLMs can produce high-quality first independent claims, but their performances markedly decrease for subsequent dependent claims. Moreover, fine-tuning can enhance the completeness of inventions' features, conceptual clarity, and feature linkage. Among the tested LLMs, GPT-4…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scylj1/llm4dpcg
pytorchOfficial

Datasets

lj408/HUPD-DCG
dataset· 7 dl
7 dl

Videos

Can Large Language Models Generate High-quality Patent Claims?· underline

Taxonomy

TopicsIntellectual Property and Patents

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Dense Connections