Named entity recognition in chemical patents using ensemble of   contextual language models

Jenny Copara; Nona Naderi; Julien Knafou; Patrick Ruch and; Douglas Teodoro

arXiv:2007.12569·cs.CL·September 18, 2020·6 cites

Named entity recognition in chemical patents using ensemble of contextual language models

Jenny Copara, Nona Naderi, Julien Knafou, Patrick Ruch and, Douglas Teodoro

PDF

Open Access

TL;DR

This paper presents an ensemble of contextual language models for extracting chemical reaction information from patents, achieving high accuracy and demonstrating the effectiveness of ensemble methods in chemical text mining.

Contribution

It introduces a new ensemble approach combining transformer models trained on generic and specialized corpora for chemical patent information extraction.

Findings

01

Achieved an exact F1-score of 92.30%

02

Achieved a relaxed F1-score of 96.24%

03

Ensemble models outperform individual models in chemical patent NER

Abstract

Chemical patent documents describe a broad range of applications holding key reaction and compound information, such as chemical structure, reaction formulas, and molecular properties. These informational entities should be first identified in text passages to be utilized in downstream tasks. Text mining provides means to extract relevant information from chemical patents through information extraction techniques. As part of the Information Extraction task of the Cheminformatics Elsevier Melbourne University challenge, in this work we study the effectiveness of contextualized language models to extract reaction information in chemical patents. We assess transformer architectures trained on a generic and specialised corpora to propose a new ensemble model. Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques