ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs
Arav Parikh, Shiri Dori-Hacohen

TL;DR
This paper introduces ClaimCompare, a novel data pipeline that generates labeled patent datasets to facilitate machine learning models in identifying novelty destroying patents, thereby streamlining patent examination processes.
Contribution
ClaimCompare is the first pipeline capable of creating multiple datasets for novelty destruction detection, enabling improved ML model training for patent prior art searches.
Findings
Constructed a dataset with over 27,000 patents in the electrochemical domain.
Fine-tuned transformer models achieved 29.2% improvement in MRR.
Achieved 32.7% improvement in P@1 for identifying novelty destroying patents.
Abstract
A fundamental step in the patent application process is the determination of whether there exist prior patents that are novelty destroying. This step is routinely performed by both applicants and examiners, in order to assess the novelty of proposed inventions among the millions of applications filed annually. However, conducting this search is time and labor-intensive, as searchers must navigate complex legal and technical jargon while covering a large amount of legal claims. Automated approaches using information retrieval and machine learning approaches to detect novelty destroying patents present a promising avenue to streamline this process, yet research focusing on this space remains limited. In this paper, we introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models to address this challenge of novelty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntellectual Property and Patents · Research Data Management Practices · Innovation Policy and R&D
MethodsBalanced Selection
