Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction
Yasir Ghunaim, Robert Hoehndorf

TL;DR
This paper enhances molecular property prediction by integrating large-scale chemical knowledge graphs into pre-trained models, significantly improving performance across multiple datasets.
Contribution
It extends the KANO model by incorporating the ChEBI knowledge graph with 2,840 functional groups, demonstrating improved predictive accuracy.
Findings
Improved performance on 9 out of 14 datasets
Large-scale knowledge integration enhances molecular representations
ChEBI knowledge graph significantly benefits property prediction
Abstract
Pre-training machine learning models on molecular properties has proven effective for generating robust and generalizable representations, which is critical for advancements in drug discovery and materials science. While recent work has primarily focused on data-driven approaches, the KANO model introduces a novel paradigm by incorporating knowledge-enhanced pre-training. In this work, we expand upon KANO by integrating the large-scale ChEBI knowledge graph, which includes 2,840 functional groups -- significantly more than the original 82 used in KANO. We explore two approaches, Replace and Integrate, to incorporate this extensive knowledge into the KANO framework. Our results demonstrate that including ChEBI leads to improved performance on 9 out of 14 molecular property prediction datasets. This highlights the importance of utilizing a larger and more diverse set of functional groups…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Various Chemistry Research Topics
MethodsSparse Evolutionary Training
