TL;DR
BioConceptVec is a large-scale set of biomedical concept embeddings learned from PubMed literature, improving the semantic understanding of biological concepts for various bioinformatics tasks.
Contribution
The paper introduces BioConceptVec, a novel large-scale biomedical concept embedding resource learned from literature, outperforming existing methods in multiple bioinformatics applications.
Findings
BioConceptVec captures semantic relationships beyond simple co-occurrence.
It outperforms existing methods in nine bioinformatics tasks.
The embeddings are publicly available for research use.
Abstract
Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term match or co-occurrence-based methods. BioConceptVec has been thoroughly evaluated in multiple bioinformatics tasks consisting of over 25 million instances from nine different biological datasets. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
