SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction
Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin, Zhang, Yanyan Lan

TL;DR
This paper introduces SIU, a large-scale dataset of over a million small molecule-protein interactions with bioactivity labels, aiming to improve unbiased bioactivity prediction for drug discovery.
Contribution
The creation of a comprehensive, systematically annotated dataset of small molecule-protein interactions at a million scale for unbiased bioactivity prediction.
Findings
Classical models show challenges in unbiased bioactivity prediction.
The dataset enables more accurate and systematic bioactivity analysis.
Unbiased prediction remains a challenging task.
Abstract
Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or toxic pharmacological outcomes of small molecules, rendering accurate bioactivity prediction crucial for the development of safe and effective drugs. However, existing structural datasets of small molecule-protein interactions are often limited in scale and lack systematically organized bioactivity labels, thereby impeding our understanding of these interactions and precise bioactivity prediction. In this study, we introduce a comprehensive dataset of small molecule-protein interactions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Bioinformatics and Genomic Networks · Microbial Natural Products and Biosynthesis
