When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish
Manuel Stoeckel, Wahed Hemati, Alexander Mehler

TL;DR
This paper presents a method for recognizing chemical and biomedical entities in Spanish medical texts using pooled contextualized embeddings with a BiLSTM-CRF model, achieving high F1-scores.
Contribution
It introduces a new Spanish health science corpus and demonstrates improved entity recognition performance with domain-specific embeddings.
Findings
Achieved 89.76% F1-score with pre-trained embeddings.
Improved to 90.52% F1-score with specialized embeddings.
First application of pooled contextualized embeddings for Spanish biomedical NER.
Abstract
The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76% F1-score using pre-trained embeddings and are able to improve these results to 90.52% F1-score using specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
