Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
Qizhi Pei, Zhimeng Zhou, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, Lijun Wu, and Rui Yan

TL;DR
This survey reviews recent advances in multi-modal learning that combines biomolecular modeling with natural language processing, highlighting representations, applications, resources, and future directions in this interdisciplinary field.
Contribution
It provides a comprehensive overview of recent progress, resources, and future research directions in integrating biomolecular data with natural language through multi-modal learning.
Findings
Survey of biomolecular representations including sequences and 3D structures
Analysis of multi-modal integration strategies for language and molecular data
Compilation of datasets and resources for future research
Abstract
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomolecule property prediction. The fusion of the nuanced narratives expressed through natural language with the structural and functional specifics of biomolecules described via various molecular modeling techniques opens new avenues for comprehensively representing and analyzing biomolecules. By incorporating the contextual language data that surrounds biomolecules into their modeling, BL aims to capture a holistic view encompassing both the symbolic qualities conveyed through language as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics
