A Survey on Sentence Embedding Models Performance for Patent Analysis
Hamid Bekamiri, Daniel S. Hain, Roman Jurowetzki

TL;DR
This survey evaluates the performance of various sentence embedding models for patent analysis, highlighting the top algorithms and proposing a standard dataset for assessing embedding accuracy in patent similarity tasks.
Contribution
The paper provides a comprehensive overview of embedding model performance for patent similarity, introduces a standard evaluation library and dataset, and compares top models across patent classification levels.
Findings
PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings perform best at the subclass level.
Model performance varies across different patent classes and sections.
The study offers guidance for selecting appropriate embedding models based on patent data segments.
Abstract
Patent data is an important source of knowledge for innovation research, while the technological similarity between pairs of patents is a key enabling indicator for patent analysis. Recently researchers have been using patent vector space models based on different NLP embeddings models to calculate the technological similarity between pairs of patents to help better understand innovations, patent landscaping, technology mapping, and patent quality evaluation. More often than not, Text Embedding is a vital precursor to patent analysis tasks. A pertinent question then arises: How should we measure and evaluate the accuracy of these embeddings? To the best of our knowledge, there is no comprehensive survey that builds a clear delineation of embedding models' performance for calculating patent similarity indicators. Therefore, in this study, we provide an overview of the accuracy of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntellectual Property and Patents
