DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding
Zhu Wang, Homaira Huda Shomee, Sathya N. Ravi, Sourav Medya

TL;DR
DesignCLIP leverages CLIP-based multimodal learning to improve design patent classification and retrieval by incorporating detailed captions and multi-view image analysis, outperforming existing models.
Contribution
This work introduces DesignCLIP, a novel multimodal framework using CLIP for design patent understanding, with class-aware classification and contrastive learning tailored for patent data.
Findings
Outperforms baseline and SOTA models in patent tasks
Effective in patent classification and retrieval
Enhances multimodal patent analysis
Abstract
In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data. However, patent images -- typically consisting of sketches with abstract and structural elements of an invention -- often fall short in conveying comprehensive visual context and semantic information. This inadequacy can lead to ambiguities in evaluation during prior art searches. Recent advancements in vision-language models, such as CLIP, offer promising opportunities for more reliable and accurate AI-driven patent analysis. In this work, we leverage CLIP models to develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents. To address the unique characteristics of patent data, DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntellectual Property and Patents · Machine Learning in Materials Science · Advanced Graph Neural Networks
