MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis
Shikun Feng, Jiaxin Zheng, Yinjun Jia, Yanwen Huang, Fengfeng Zhou,, Wei-Ying Ma, Yanyan Lan

TL;DR
This paper introduces MoleculeCLA, a large-scale computational ligand-target binding dataset designed to improve molecular property benchmarks for drug discovery, addressing limitations of existing experimental data.
Contribution
The creation of a comprehensive, computationally derived molecular dataset with extensive properties and interpretability for benchmarking in molecular representation learning.
Findings
The dataset contains approximately 140,000 molecules with detailed properties.
Models trained on this dataset show improved interpretability and performance.
The dataset enhances understanding of drug-target interactions.
Abstract
Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address these issues, we construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules, meticulously designed to capture an extensive array of chemical, physical, and biological properties, derived through a robust computational ligand-target binding analysis pipeline. We conduct extensive experiments on various deep learning models, demonstrating that our dataset offers significant physicochemical interpretability to guide model development and design.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Chemical Synthesis and Analysis · Mass Spectrometry Techniques and Applications
