AI Prior Art Search: Semantic Clusters and Evaluation Infrastructure
Boris Genin (1), Alexander Gorbunov (2), Dmitry Zolkin (1), Igor Nekrasov (1) ((1) Division for the Design of Information Search Systems, Federal Institute of Industrial Property, Berezhkovskaya nab. 30-1, Moscow, 125993

TL;DR
This paper introduces infrastructure and datasets for AI-based patent prior art search, focusing on semantic clustering of documents and tools for evaluating search quality, facilitating research and development in this domain.
Contribution
The work provides a comprehensive infrastructure including large datasets, semantic cluster definitions, and evaluation tools for AI-driven patent prior art search.
Findings
Created datasets with 14 million US patent clusters and 1 million Russian clusters.
Developed a utility for automated evaluation of search quality.
Defined semantic clusters as key to understanding the state of the art.
Abstract
The key to success in automating prior art search in patent research using artificial intelligence (AI) lies in developing large datasets for machine learning (ML) and ensuring their availability. This work is dedicated to providing a comprehensive solution to the problem of creating infrastructure for research in this field, including datasets and tools for calculating search quality criteria. The paper discusses the concept of semantic clusters of patent documents that determine the state of the art in a given subject, as proposed by the authors. A definition of such semantic clusters is also provided. Prior art search is presented as the task of identifying elements within a semantic cluster of patent documents in the subject area specified by the document under consideration. A generator of user-configurable datasets for ML, based on collections of U.S. and Russian patent documents,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing · Intellectual Property and Patents · Big Data and Digital Economy
