Large Language Model Informed Patent Image Retrieval
Hao-Cheng Lo, Jung-Mei Chu, Jieh Hsiang, Chun-Chieh Cho

TL;DR
This paper introduces a novel multimodal approach that combines large language models with image features to improve patent image retrieval, addressing generalizability and class imbalance issues, and demonstrating state-of-the-art results on a large dataset.
Contribution
The paper presents a language-informed, distribution-aware multimodal method for patent image retrieval that enhances semantic understanding and class balance, achieving superior performance.
Findings
Achieves +53.3% mAP in patent image retrieval
Improves Recall@10 by +41.8%
Enhances MRR@10 by +51.9%
Abstract
In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating Large Language Models and improves the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Machine Learning in Materials Science · Radiomics and Machine Learning in Medical Imaging
