Spanish Biomedical and Clinical Language Embeddings
Asier Guti\'errez-Fandi\~no, Jordi Armengol-Estap\'e, Casimiro Pio, Carrino, Ona De Gibert, Aitor Gonzalez-Agirre, Marta Villegas

TL;DR
This paper presents Spanish biomedical and clinical language embeddings created with FastText, demonstrating improved performance with increased data and using BPE for sub-word representations.
Contribution
It introduces Spanish biomedical embeddings using FastText and BPE, showing that larger datasets lead to better representations.
Findings
Biomedical embeddings outperform previous versions
More data improves embedding quality
BPE effectively captures sub-word information
Abstract
We computed both Word and Sub-word Embeddings using FastText. For Sub-word embeddings we selected Byte Pair Encoding (BPE) algorithm to represent the sub-words. We evaluated the Biomedical Word Embeddings obtaining better results than previous versions showing the implication that with more data, we obtain better representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
