Enriching Patent Claim Generation with European Patent Dataset
Lekang Jiang, Chengzu Li, Stephan Goetz

TL;DR
This paper introduces the European Patent Dataset (EPD), a high-quality, diverse dataset of European patents designed to improve patent claim generation models and better reflect real-world legal and drafting standards.
Contribution
The paper presents EPD, a new European patent dataset that enhances claim generation research by providing jurisdictional diversity, high-quality legal texts, and challenging real-world samples.
Findings
LLMs fine-tuned on EPD outperform previous datasets and GPT-4o in claim quality.
Experiments show significant performance gaps on challenging EPD subsets.
EPD enables more comprehensive evaluation of patent claim generation models.
Abstract
Drafting patent claims is time-intensive, costly, and requires professional skill. Therefore, researchers have investigated large language models (LLMs) to assist inventors in writing claims. However, existing work has largely relied on datasets from the United States Patent and Trademark Office (USPTO). To enlarge research scope regarding various jurisdictions, drafting conventions, and legal standards, we introduce EPD, a European patent dataset. EPD presents rich textual data and structured metadata to support multiple patent-related tasks, including claim generation. This dataset enriches the field in three critical aspects: (1) Jurisdictional diversity: Patents from different offices vary in legal and drafting conventions. EPD fills a critical gap by providing a benchmark for European patents to enable more comprehensive evaluation. (2) Quality improvement: EPD offers high-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsIntellectual Property and Patents · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)
