An automatic patent literature retrieval system based on LLM-RAG

Yao Ding; Yuqing Wu; Ziyang Ding

arXiv:2508.14064·cs.IR·August 21, 2025

An automatic patent literature retrieval system based on LLM-RAG

Yao Ding, Yuqing Wu, Ziyang Ding

PDF

Open Access

TL;DR

This paper introduces an automated patent retrieval system that combines Large Language Models with Retrieval-Augmented Generation to improve semantic accuracy and relevance in patent literature searches.

Contribution

It presents a novel integrated framework leveraging LLMs and RAG technology for efficient, accurate patent retrieval and classification, outperforming traditional methods.

Findings

01

Achieved 80.5% semantic matching accuracy

02

Surpassed baseline LLM methods by 28 percentage points

03

Demonstrated strong generalization in cross-domain tasks

Abstract

With the acceleration of technological innovation efficient retrieval and classification of patent literature have become essential for intellectual property management and enterprise RD Traditional keyword and rulebased retrieval methods often fail to address complex query intents or capture semantic associations across technical domains resulting in incomplete and lowrelevance results This study presents an automated patent retrieval framework integrating Large Language Models LLMs with RetrievalAugmented Generation RAG technology The system comprises three components: 1) a preprocessing module for patent data standardization, 2) a highefficiency vector retrieval engine leveraging LLMgenerated embeddings, and 3) a RAGenhanced query module that combines external document retrieval with contextaware response generation Evaluations were conducted on the Google Patents dataset 20062024…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntellectual Property and Patents · Research Data Management Practices