UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection
Jigang Fan, Quanlin Wu, Shengjie Luo, Liwei Wang

TL;DR
UniSite introduces a novel protein-centric dataset and an end-to-end detection framework for ligand binding sites, improving accuracy and addressing biases in existing methods and datasets.
Contribution
The paper presents the first UniProt-centric ligand binding site dataset and an end-to-end detection framework with set prediction loss, advancing the field of binding site prediction.
Findings
UniSite-DS has 4.81 times more multi-site data than previous datasets.
UniSite outperforms existing methods in ligand binding site detection.
IoU-based Average Precision is a more accurate evaluation metric.
Abstract
The detection of ligand binding sites for proteins is a fundamental step in Structure-Based Drug Design. Despite notable advances in recent years, existing methods, datasets, and evaluation metrics are confronted with several key challenges: (1) current datasets and methods are centered on individual protein-ligand complexes and neglect that diverse binding sites may exist across multiple complexes of the same protein, introducing significant statistical bias; (2) ligand binding site detection is typically modeled as a discontinuous workflow, employing binary segmentation and subsequent clustering algorithms; (3) traditional evaluation metrics do not adequately reflect the actual performance of different binding site prediction methods. To address these issues, we first introduce UniSite-DS, the first UniProt (Unique Protein)-centric ligand binding site dataset, which contains 4.81…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Protein Structure and Dynamics · Machine Learning in Bioinformatics
MethodsSparse Evolutionary Training
