KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge
Zaifei Yang, Hong Chang, Ruibing Hou, Shiguang Shan, Xilin Chen

TL;DR
KnowMol is a new multi-modal molecular large language model that leverages a large-scale, annotated dataset and chemically-informed representations to significantly improve molecular understanding and generation tasks.
Contribution
The paper introduces KnowMol, a state-of-the-art molecular LLM with a large annotated dataset and novel chemical representations, addressing key limitations in current models.
Findings
Achieves superior performance in molecular understanding tasks
Demonstrates improved molecular generation capabilities
Sets new benchmarks on multiple molecular tasks
Abstract
The molecular large language models have garnered widespread attention due to their promising potential on molecular applications. However, current molecular large language models face significant limitations in understanding molecules due to inadequate textual descriptions and suboptimal molecular representation strategies during pretraining. To address these challenges, we introduce KnowMol-100K, a large-scale dataset with 100K fine-grained molecular annotations across multiple levels, bridging the gap between molecules and textual descriptions. Additionally, we propose chemically-informative molecular representation, effectively addressing limitations in existing molecular representation strategies. Building upon these innovations, we develop KnowMol, a state-of-the-art multi-modal molecular large language model. Extensive experiments demonstrate that KnowMol achieves superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Machine Learning in Bioinformatics
