Spanish Biomedical and Clinical Language Embeddings

Asier Guti\'errez-Fandi\~no; Jordi Armengol-Estap\'e; Casimiro Pio; Carrino; Ona De Gibert; Aitor Gonzalez-Agirre; Marta Villegas

arXiv:2102.12843·cs.CL·February 26, 2021·5 cites

Spanish Biomedical and Clinical Language Embeddings

Asier Guti\'errez-Fandi\~no, Jordi Armengol-Estap\'e, Casimiro Pio, Carrino, Ona De Gibert, Aitor Gonzalez-Agirre, Marta Villegas

PDF

Open Access

TL;DR

This paper presents Spanish biomedical and clinical language embeddings created with FastText, demonstrating improved performance with increased data and using BPE for sub-word representations.

Contribution

It introduces Spanish biomedical embeddings using FastText and BPE, showing that larger datasets lead to better representations.

Findings

01

Biomedical embeddings outperform previous versions

02

More data improves embedding quality

03

BPE effectively captures sub-word information

Abstract

We computed both Word and Sub-word Embeddings using FastText. For Sub-word embeddings we selected Byte Pair Encoding (BPE) algorithm to represent the sub-words. We evaluated the Biomedical Word Embeddings obtaining better results than previous versions showing the implication that with more data, we obtain better representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling