BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations
Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu,, Yingce Xia, Rui Yan

TL;DR
BioT5 is a novel pre-training framework that enhances cross-modal biological data integration by utilizing chemical knowledge, natural language associations, and robust molecular representations to improve performance in bio-entity related tasks.
Contribution
It introduces BioT5, a comprehensive pre-training model that effectively combines structured and unstructured biological knowledge with natural language, addressing limitations of previous models.
Findings
BioT5 outperforms existing models on various biological tasks.
Utilizes SELFIES for 100% robust molecular representations.
Effectively distinguishes and leverages structured and unstructured knowledge.
Abstract
Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose , a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. utilizes SELFIES for robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗QizhiPei/biot5-basemodel· 124 dl· ♡ 9124 dl♡ 9
- 🤗QizhiPei/biot5-base-dti-bindingdbmodel· 8 dl8 dl
- 🤗QizhiPei/biot5-base-dti-biosnapmodel· 1 dl1 dl
- 🤗QizhiPei/biot5-base-dti-humanmodel· 2 dl2 dl
- 🤗QizhiPei/biot5-base-mol2textmodel· 28 dl· ♡ 428 dl♡ 4
- 🤗QizhiPei/biot5-base-text2molmodel· 234 dl· ♡ 4234 dl♡ 4
- 🤗QizhiPei/biot5-base-peer-binlocmodel· 3 dl3 dl
- 🤗QizhiPei/biot5-base-peer-human_ppimodel· 1 dl· ♡ 11 dl♡ 1
- 🤗QizhiPei/biot5-base-peer-solubilitymodel· 3 dl· ♡ 13 dl♡ 1
- 🤗QizhiPei/biot5-base-peer-yeast_ppimodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Computational Drug Discovery Methods · Biomedical Text Mining and Ontologies
