# Accurate Identification of Protein Binding Sites for All Drug Modalities Using ALLSites

**Authors:** Minjie Mou, Mingkun Lu, Zhimeng Zhou, Yanlin Ren, Xinyuan Yu, Ziqi Pan, Yuan Zhou, Hao Yang, Lingyan Zheng, Shukai Gu, Yang Zhang, Wei Hu, Fengcheng Li, Haibin Dai, Feng Zhu

PMC · DOI: 10.1002/advs.202516530 · Advanced Science · 2025-12-27

## TL;DR

ALLSites is a new method that accurately predicts protein binding sites for all types of drugs using only protein sequences, without needing structural data.

## Contribution

ALLSites introduces a unified sequence-based framework that accurately identifies binding sites across all drug modalities, bridging sequence- and structure-based approaches.

## Key findings

- ALLSites achieves state-of-the-art performance among sequence-based methods for predicting binding sites.
- It matches the accuracy of the best structure-based tools while being structure-free.
- The method works across diverse drug modalities including proteins, peptides, small molecules, and nucleic acids.

## Abstract

Proteins interact with diverse molecular modalities, yet the incomplete identification of their binding sites has left the proteome‐wide druggability largely underexplored. Although various computational methods have been developed for the prediction of protein binding sites, existing approaches are limited by their specificity to a single drug modality, dependence on high‐quality structural data, or insufficient predictive accuracy. Here, a unified sequence‐based framework, ALLSites, is constructed to identify proteome‐wide binding sites across all drug modalities. Leveraging ESM‐2 embeddings, ALLSites integrates a gated convolutional network with a transformer architecture to capture both global and local sequence features, effectively modeling residue interactions directly from sequence. This design bridges the gap between sequence‐based and structure‐based approaches, enabling ALLSites to achieve superior predictive performance across diverse drug modalities, including proteins, peptides, small molecules, carbohydrates, DNA, and RNA. It achieves state‐of‐the‐art performance among sequence‐based methods and matches the accuracy of the best structure‐based tools. By enabling accurate and structure‐free binding site prediction across all drug modalities, ALLSites is expected to expand the druggable proteome and provide a powerful resource for drug discovery.

ALLSites is a unified sequence‐based framework for identifying proteome‐wide binding sites across all drug modalities. It integrates a gated convolutional network with a transformer architecture to capture residue interactions directly from the sequence. This design bridges the gap between sequence‐based and structure‐based approaches, enabling ALLSites to achieve superior predictive performance across diverse drug modalities, including proteins, peptides, small molecules, carbohydrates, DNA, and RNA. The balance between accuracy and applicability makes ALLSites a valuable resource for advancing the understanding of proteome‐wide druggability.

## Full-text entities

- **Chemicals:** carbohydrates (MESH:D002241)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12915145/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12915145/full.md

## References

98 references — full list in the complete paper: https://tomesphere.com/paper/PMC12915145/full.md

---
Source: https://tomesphere.com/paper/PMC12915145