A Semi-supervised Molecular Learning Framework for Activity Cliff Estimation
Fang Wu

TL;DR
This paper introduces SemiMol, a semi-supervised learning framework that improves molecular activity cliff prediction by leveraging unannotated data and a novel curriculum learning approach, outperforming existing methods especially in low-data scenarios.
Contribution
The paper presents SemiMol, a semi-supervised learning method with an instructor model and adaptive curriculum, specifically designed to handle activity cliffs in molecular property prediction.
Findings
SemiMol significantly improves prediction accuracy on activity cliff datasets.
It outperforms state-of-the-art pretraining and SSL baselines.
The approach is effective across 30 diverse datasets.
Abstract
Machine learning (ML) enables accurate and fast molecular property predictions, which are of interest in drug discovery and material design. Their success is based on the principle of similarity at its heart, assuming that similar molecules exhibit close properties. However, activity cliffs challenge this principle, and their presence leads to a sharp decline in the performance of existing ML algorithms, particularly graph-based methods. To overcome this obstacle under a low-data scenario, we propose a novel semi-supervised learning (SSL) method dubbed SemiMol, which employs predictions on numerous unannotated data as pseudo-signals for subsequent training. Specifically, we introduce an additional instructor model to evaluate the accuracy and trustworthiness of proxy labels because existing pseudo-labeling approaches require probabilistic outputs to reveal the model's confidence and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
