Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence
Shuo Zhang, Lei Xie

TL;DR
LaMPSite predicts protein ligand binding sites using only sequence data and ligand graphs, leveraging pre-trained language models and graph neural networks to achieve competitive results without 3D structures.
Contribution
Introduces LaMPSite, a novel method that predicts binding sites solely from protein sequences and ligand graphs, bypassing the need for 3D structures.
Findings
Achieves competitive performance with structure-based methods
Operates effectively without 3D coordinate information
Expands applicability to proteins lacking structural data
Abstract
Prediction of ligand binding sites of proteins is a fundamental and important task for understanding the function of proteins and screening potential drugs. Most existing methods require experimentally determined protein holo-structures as input. However, such structures can be unavailable on novel or less-studied proteins. To tackle this limitation, we propose LaMPSite, which only takes protein sequences and ligand molecular graphs as input for ligand binding site predictions. The protein sequences are used to retrieve residue-level embeddings and contact maps from the pre-trained ESM-2 protein language model. The ligand molecular graphs are fed into a graph neural network to compute atom-level embeddings. Then we compute and update the protein-ligand interaction embedding based on the protein residue-level embeddings and ligand atom-level embeddings, and the geometric constraints in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Protein Structure and Dynamics · Machine Learning in Bioinformatics
MethodsGraph Neural Network
