SiteFerret: beyond simple pocket identification in proteins
Luca Gagliardi, Walter Rocchia

TL;DR
SiteFerret is a versatile, unsupervised method for detecting and characterizing protein pockets, including subpockets, using hierarchical clustering and anomaly detection, improving binding site prediction accuracy.
Contribution
It introduces a novel hierarchical clustering and anomaly detection approach for pocket identification and subpocket segmentation on protein surfaces.
Findings
Accurately predicts diverse binding sites including small molecules and peptides.
Provides detailed feature importance analysis for binding site characterization.
Subpocket segmentation enhances binding site localization.
Abstract
We present a novel method for the automatic detection of pockets on protein molecular surfaces. The algorithm is based on an ad hoc hierarchical clustering of virtual SES probe spheres obtained from the geometrical primitives generated by the NanoShaper software. The final ranking of putative pockets is based on the Isolation Forest method, an unsupervised learning approach originally developed for anomaly detection. A detailed importance analysis of pocket features provides insight on which geometrical (clustering) and chemical (residues) properties characterize a good binding site. The method also provides a segmentation of pockets into smaller subpockets. We prove that subpockets are a reliable representation that pinpoint the binding site with greater precision. Site Ferret is outstanding in its versatility, accurately predicting a wide range of binding sites, from small molecules…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Advanced Proteomics Techniques and Applications · Machine Learning in Bioinformatics
MethodsHigh-Order Consensuses
