Patent Figure Classification using Large Vision-language Models

Sushil Awale; Eric M\"uller-Budack; Ralph Ewerth

arXiv:2501.12751·cs.IR·January 23, 2025

Patent Figure Classification using Large Vision-language Models

Sushil Awale, Eric M\"uller-Budack, Ralph Ewerth

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of large vision-language models for patent figure classification, introducing new datasets and a novel classification strategy to improve multi-aspect understanding in few-shot scenarios.

Contribution

It introduces new datasets for patent figure classification and proposes a tournament-style classification method leveraging LVLMs for multi-aspect analysis.

Findings

01

LVLMs are effective for patent figure classification in few-shot settings.

02

The proposed tournament-style approach improves classification accuracy.

03

LVLMs outperform traditional CNNs in zero-shot and few-shot scenarios.

Abstract

Patent figure classification facilitates faceted search in patent retrieval systems, enabling efficient prior art search. Existing approaches have explored patent figure classification for only a single aspect and for aspects with a limited number of concepts. In recent years, large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks, however, they remain unexplored for patent figure classification. Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios. For this purpose, we introduce new datasets, PatFigVQA and PatFigCLS, for fine-tuning and evaluation regarding multiple aspects of patent figures~(i.e., type, projection, patent class, and objects). For a computational-effective handling of a large number of classes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tibhannover/patent-figure-classification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Handwritten Text Recognition Techniques · Metallurgy and Material Forming