An automatic patent literature retrieval system based on LLM-RAG
Yao Ding, Yuqing Wu, Ziyang Ding

TL;DR
This paper introduces an automated patent retrieval system that combines Large Language Models with Retrieval-Augmented Generation to improve semantic accuracy and relevance in patent literature searches.
Contribution
It presents a novel integrated framework leveraging LLMs and RAG technology for efficient, accurate patent retrieval and classification, outperforming traditional methods.
Findings
Achieved 80.5% semantic matching accuracy
Surpassed baseline LLM methods by 28 percentage points
Demonstrated strong generalization in cross-domain tasks
Abstract
With the acceleration of technological innovation efficient retrieval and classification of patent literature have become essential for intellectual property management and enterprise RD Traditional keyword and rulebased retrieval methods often fail to address complex query intents or capture semantic associations across technical domains resulting in incomplete and lowrelevance results This study presents an automated patent retrieval framework integrating Large Language Models LLMs with RetrievalAugmented Generation RAG technology The system comprises three components: 1) a preprocessing module for patent data standardization, 2) a highefficiency vector retrieval engine leveraging LLMgenerated embeddings, and 3) a RAGenhanced query module that combines external document retrieval with contextaware response generation Evaluations were conducted on the Google Patents dataset 20062024…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntellectual Property and Patents · Research Data Management Practices
