Seq2Bind Webserver for Decoding Binding Hotspots directly from Sequences using Fine-Tuned Protein Language Models
Xiang Ma, Supantha Dey, Vaishnavey SR, Casey Zelinski, Qi Li, and Ratul Chowdhury

TL;DR
Seq2Bind Webserver uses fine-tuned protein language models to predict protein-protein binding hotspots directly from sequences, offering a fast and accurate alternative to structural methods for identifying critical interaction residues.
Contribution
This work introduces a sequence-based framework using fine-tuned PLMs for residue-level PPI prediction, outperforming some structural docking methods and enabling rapid hotspot identification.
Findings
ESM2 achieved 49.5% accuracy at N-factor=3
Sequence-based predictions outperform HADDOCK docking at N-factor=2
The approach enables rapid screening within minutes
Abstract
Decoding protein-protein interactions (PPIs) at the residue level is crucial for understanding cellular mechanisms and developing targeted therapeutics. We present Seq2Bind Webserver, a computational framework that leverages fine-tuned protein language models (PLMs) to determine binding affinity between proteins and identify critical binding residues directly from sequences, eliminating the structural requirements that limit most affinity prediction tools. We fine-tuned four architectures including ProtBERT, ProtT5, ESM2, and BiLSTM on the SKEMPI 2.0 dataset containing 5,387 protein pairs with experimental binding affinities. Through systematic alanine mutagenesis on each residue for 14 therapeutically relevant protein complexes, we evaluated each model's ability to identify interface residues. Performance was assessed using N-factor metrics, where N-factor=3 evaluates whether true…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
MethodsLong Short-Term Memory · Bidirectional LSTM
