PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shreya Shukla, Nakul Sharma, Manish Gupta, Anand Mishra

TL;DR
PatentLMM is a specialized multimodal large language model designed to generate detailed descriptions of patent figures, supported by a large dataset, aiming to automate and improve patent documentation processes.
Contribution
The paper introduces PatentLMM and PatentDesc-355K, a large dataset and a domain-specific multimodal model for patent figure description generation, advancing automation in patent documentation.
Findings
Vision encoder tailored for patent figures improves description quality.
PatentLMM outperforms off-the-shelf multimodal models in coherence.
Public release of dataset and code facilitates further research.
Abstract
Writing comprehensive and accurate descriptions of technical drawings in patent documents is crucial to effective knowledge sharing and enabling the replication and protection of intellectual property. However, automation of this task has been largely overlooked by the research community. To this end, we introduce PatentDesc-355K, a novel large-scale dataset containing ~355K patent figures along with their brief and detailed textual descriptions extracted from more than 60K US patent documents. In addition, we propose PatentLMM - a novel multimodal large language model specifically tailored to generate high-quality descriptions of patent figures. Our proposed PatentLMM comprises two key components: (i) PatentMME, a specialized multimodal vision encoder that captures the unique structural elements of patent figures, and (ii) PatentLLaMA, a domain-adapted version of LLaMA fine-tuned on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsLLaMA
