Introduction of a novel word embedding approach based on technology labels extracted from patent data
Mark Standke, Abdullah Kiwan, Annalena Lange, Silvan Berg

TL;DR
This paper introduces a new word embedding method leveraging patent technology labels to generate accurate, language-independent vectors for technical terms, addressing the challenge of diverse patent language.
Contribution
It presents a novel statistical analysis-based word embedding approach specifically designed for patent terminology, improving synonym detection in patent searches.
Findings
Qualitative results demonstrate the effectiveness of the approach.
The method is an extension of EQMania's previous work.
Algorithm can be tested online until April 2021.
Abstract
Diversity in patent language is growing and makes finding synonyms for conducting patent searches more and more challenging. In addition to that, most approaches for dealing with diverse patent language are based on manual search and human intuition. In this paper, a word embedding approach using statistical analysis of human labeled data to produce accurate and language independent word vectors for technical terms is introduced. This paper focuses on the explanation of the idea behind the statistical analysis and shows first qualitative results. The resulting algorithm is a development of the former EQMania UG (eqmania.com) and can be tested under eqalice.com until April 2021.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntellectual Property and Patents · Biomedical Text Mining and Ontologies
