# Contrastive learning enhanced retrieval-augmented few-shot framework for multi-label patent classification

**Authors:** Wenlong Zheng, Xin Li, Guoqing Cui, Shikun Chen, Haofeng Zhang, Haofeng Zhang, Haofeng Zhang

PMC · DOI: 10.1371/journal.pone.0341118 · PLOS One · 2026-01-21

## TL;DR

This paper introduces a new framework for classifying patents into multiple categories using contrastive learning and retrieval methods, achieving better performance with less data.

## Contribution

A novel retrieval-augmented few-shot learning framework combining contrastive pre-training and semantic retrieval for multi-label patent classification.

## Key findings

- The framework achieves Macro-F1 and Micro-F1 scores of 0.847 and 0.892 on a drone patent dataset.
- It improves performance by 30% and 23% over few-shot baselines.
- Contrastive pre-training improves underrepresented categories by up to 16% over transformer-based methods.

## Abstract

The rapid expansion of patent databases poses increasing challenges for multi-label patent classification, particularly for inventions spanning multiple technological domains. Conventional approaches are hindered by high annotation costs and limited scalability, while often neglecting the semantic structure of patent documents. Here, we present a retrieval-enhanced few-shot learning framework that combines patent-specific contrastive pre-training with semantic retrieval to enable scalable multi-label classification. Drone technologies are selected as the evaluation domain due to their multidisciplinary characteristics encompassing mechanical, electronic, and software aspects. The proposed method learns domain-adapted embeddings that capture multi-label co-occurrence patterns and leverages retrieval-augmented few-shot learning with structured reasoning to reduce reliance on extensive annotations. Experiments on a curated dataset of 15,000 annotated drone patents across ten categories demonstrate that the framework achieves Macro-F1 and Micro-F1 scores of 0.847 and 0.892, corresponding to improvements of 30% and 23% over few-shot baselines. Furthermore, contrastive pre-training yields notable benefits for underrepresented categories, with performance improvements reaching 16% over transformer-based approaches. These results indicate that the proposed approach offers an effective and resource-efficient solution for multi-label patent classification, with potential to improve the scalability and accessibility of intellectual property analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12822942/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12822942/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12822942/full.md

---
Source: https://tomesphere.com/paper/PMC12822942