When Specialization Helps: Using Pooled Contextualized Embeddings to   Detect Chemical and Biomedical Entities in Spanish

Manuel Stoeckel; Wahed Hemati; Alexander Mehler

arXiv:1910.03387·cs.CL·October 9, 2019

When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish

Manuel Stoeckel, Wahed Hemati, Alexander Mehler

PDF

TL;DR

This paper presents a method for recognizing chemical and biomedical entities in Spanish medical texts using pooled contextualized embeddings with a BiLSTM-CRF model, achieving high F1-scores.

Contribution

It introduces a new Spanish health science corpus and demonstrates improved entity recognition performance with domain-specific embeddings.

Findings

01

Achieved 89.76% F1-score with pre-trained embeddings.

02

Improved to 90.52% F1-score with specialized embeddings.

03

First application of pooled contextualized embeddings for Spanish biomedical NER.

Abstract

The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76% F1-score using pre-trained embeddings and are able to improve these results to 90.52% F1-score using specialized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.