CSpace: a concept embedding space for biomedical applications
Danilo Tomasoni, Luca Marchetti

TL;DR
CSpace is a biomedical concept embedding model that improves semantic search and concept-relatedness measurements with efficient performance.
Contribution
CSpace introduces a concise biomedical embedding space that outperforms alternatives in semantic tasks and supports efficient concept-relatedness computation.
Findings
CSpace achieves better out-of-vocabulary ratio and semantic textual similarity than existing models.
It performs comparably to transformer-based models in sentence similarity tasks but with simpler architecture.
CSpace integrates ontological identifiers for efficient disease, gene, and condition relatedness analysis.
Abstract
The rise of transformer-based architectures has dramatically improved our ability to analyze natural language. However, the power and flexibility of these general-purpose models come at the cost of highly complex model architectures with billions of parameters that are not always needed. In this work, we present CSpace: a concise word embedding of biomedical concepts that outperforms all alternatives in terms of out-of-vocabulary ratio and semantic textual similarity task, and has comparable performance with respect to transformer-based alternatives in the sentence similarity task. This ability can serve as the foundation for semantic search by enabling efficient retrieval of conceptually related terms. Additionally, CSpace incorporates ontological identifiers (MeSH, NCBI gene and taxonomy IDs), enabling computationally efficient disease, gene or condition relatedness measurement,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Genomics and Rare Diseases
