Activity Cliff Prediction: Dataset and Benchmark
Ziqiao Zhang, Bangyi Zhao, Ailin Xie, Yatao Bian, Shuigeng Zhou

TL;DR
This paper introduces ACNet, the first large-scale dataset for activity cliff prediction in drug discovery, and benchmarks deep learning models, highlighting challenges and advantages of traditional methods.
Contribution
It provides a comprehensive dataset for activity cliff prediction and evaluates multiple models, establishing a baseline for future research in AI-driven drug discovery.
Findings
Deep learning models perform well with sufficient data.
Imbalanced and low-data scenarios remain challenging.
Traditional ECFP methods outperform some deep models on certain subsets.
Abstract
Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we first introduce ACNet, a large-scale dataset for AC prediction. ACNet curates over 400K Matched Molecular Pairs (MMPs) against 190 targets, including over 20K MMP-cliffs and 380K non-AC MMPs, and provides five subsets for model development and evaluation. Then, we propose a baseline framework to benchmark the predictive performance of molecular representations encoded by deep neural networks for AC prediction, and 16 models are evaluated in experiments. Our experimental results show that deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Chemical Synthesis and Analysis
