PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims

Valentin Knappich; Annemarie Friedrich; Anna H\"atty; Simon Razniewski

arXiv:2505.21342·cs.CL·June 19, 2025

PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims

Valentin Knappich, Annemarie Friedrich, Anna H\"atty, Simon Razniewski

PDF

Open Access

TL;DR

PEDANTIC is a new dataset of 14,000 US patent claims annotated with reasons for indefiniteness, created using an automated pipeline with LLMs, to improve automatic patent definiteness examination.

Contribution

The paper introduces PEDANTIC, the first large-scale annotated dataset for patent definiteness, generated via an automated pipeline with validation, facilitating research in automatic patent examination.

Findings

01

LLMs often struggle to outperform simple models in definiteness prediction.

02

The pipeline achieves high-quality annotations validated by human study.

03

LLMs can identify reasons for indefiniteness but not always improve prediction accuracy.

Abstract

Patent claims define the scope of protection for an invention. If there are ambiguities in a claim, it is rejected by the patent office. In the US, this is referred to as indefiniteness (35 U.S.C {\S} 112(b)) and is among the most frequent reasons for patent application rejection. The development of automatic methods for patent definiteness examination has the potential to make patent drafting and examination more efficient, but no annotated dataset has been published to date. We introduce PEDANTIC (Patent Definiteness Examination Corpus), a novel dataset of 14k US patent claims from patent applications relating to Natural Language Processing (NLP), annotated with reasons for indefiniteness. We construct PEDANTIC using a fully automatic pipeline that retrieves office action documents from the USPTO and uses Large Language Models (LLMs) to extract the reasons for indefiniteness. A human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntellectual Property and Patents · Explainable Artificial Intelligence (XAI) · Law, AI, and Intellectual Property