# Solution for the EPO CodeFest on Green Plastics: Hierarchical   multi-label classification of patents relating to green plastics using deep   learning

**Authors:** Tingting Qiao, Gonzalo Moro Perez

arXiv: 2302.13784 · 2023-02-28

## TL;DR

This paper presents a hierarchical multi-label classification approach for patents related to green plastics, utilizing deep learning models based on SciBERT, to address the lack of existing classification schemes in this emerging field.

## Contribution

The authors propose a novel classification scheme and an automatic label assignment method, along with two deep learning models, setting a new benchmark for classifying green plastics patents.

## Key findings

- Models outperform baseline methods in classification accuracy
- The approach effectively captures high-level semantic information
- The models provide interpretable insights through word importance visualization

## Abstract

This work aims at hierarchical multi-label patents classification for patents disclosing technologies related to green plastics. This is an emerging field for which there is currently no classification scheme, and hence, no labeled data is available, making this task particularly challenging. We first propose a classification scheme for this technology and a way to learn a machine learning model to classify patents into the proposed classification scheme. To achieve this, we come up with a strategy to automatically assign labels to patents in order to create a labeled training dataset that can be used to learn a classification model in a supervised learning setting. Using said training dataset, we come up with two classification models, a SciBERT Neural Network (SBNN) model and a SciBERT Hierarchical Neural Network (SBHNN) model. Both models use a BERT model as a feature extractor and on top of it, a neural network as a classifier. We carry out extensive experiments and report commonly evaluation metrics for this challenging classification problem. The experiment results verify the validity of our approach and show that our model sets a very strong benchmark for this problem. We also interpret our models by visualizing the word importance given by the trained model, which indicates the model is capable to extract high-level semantic information of input documents. Finally, we highlight how our solution fulfills the evaluation criteria for the EPO CodeFest and we also outline possible directions for future work. Our code has been made available at https://github.com/epo/CF22-Green-Hands

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13784/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13784/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/2302.13784/full.md

---
Source: https://tomesphere.com/paper/2302.13784