Query Generation for Patent Retrieval with Keyword Extraction based on Syntactic Features
Julien Rossi, Matthias Wirth, Evangelos Kanoulas

TL;DR
This paper introduces a novel keyword extraction method based on syntactic features and NLP to improve patent retrieval accuracy by better capturing claim relevance compared to traditional techniques.
Contribution
It presents a new approach combining syntactic analysis and NLP for extracting keywords from patent claims, enhancing prior art search effectiveness.
Findings
Keyword extraction outperforms tf-idf in search results
Syntactic features improve relevance of retrieved patents
Method yields better search performance on patent claims
Abstract
This paper describes a new method to extract relevant keywords from patent claims, as part of the task of retrieving other patents with similar claims (search for prior art). The method combines a qualitative analysis of the writing style of the claims with NLP methods to parse text, in order to represent a legal text as a specialization arborescence of terms. In this setting, the set of extracted keywords are yielding better search results than keywords extracted with traditional methods such as tf-idf. The performance is measured on the search results of a query consisting of the extracted keywords.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Web Data Mining and Analysis · Semantic Web and Ontologies
