# Towards a comprehensive view of the pocketome universe—biological implications and algorithmic challenges

**Authors:** Hanne Zillmer, Dirk Walther, Turkan Haliloglu, Arne Elofsson, Turkan Haliloglu, Arne Elofsson, Turkan Haliloglu, Arne Elofsson

PMC · DOI: 10.1371/journal.pcbi.1013298 · PLOS Computational Biology · 2025-07-24

## TL;DR

This study explores all compound-binding sites (pocketomes) across eleven species to understand evolutionary trends and functional diversity.

## Contribution

The first large-scale analysis of pocketomes using high-confidence protein structures from AlphaFold.

## Key findings

- A sub-linear scaling law was observed between unique binding sites and unique protein structures.
- Functional diversity of binding sites shows signs of saturation during evolution.
- Global pocketome maps reveal differentiating features between binding pockets.

## Abstract

With the availability of reliably predicted 3D-structures for essentially all known proteins, characterizing the entirety of compound-binding sites (binding pockets on proteins) has become a possibility. The aim of this study was to identify and analyze all compound-binding sites, i.e., the pocketomes, of eleven species from different kingdoms of life to discern evolutionary trends as well as to arrive at a global cross-species view of the pocketome universe. Computational binding site prediction was performed on all protein structures in each species as available from the AlphaFold database. The resulting set of potential binding sites was inspected for overlaps with known pockets and annotated with regard to the protein domains in which they are located. 2D-projection plots of all pockets embedded in a 128-dimensional feature space, and characterizing them with regard to selected physicochemical properties, provide informative, global pocketome maps that unveil differentiating features between pockets. Our study revealed a sub-linear scaling law of the number of unique binding sites relative to the number of unique protein structures per species. Thus, as proteomes increased in size during evolution and therefore potentially diversified, the number of distinct binding sites, reflecting potentially diversifying functions, grew less than proportionally. We discuss the biological significance of this finding as well as identify critical and unmet algorithmic challenges.

The function of proteins is governed by specific interactions with other molecules, notably small molecules (compounds, such as metabolites). The precise nature of the protein-compound interaction, and thus, the associated function, is determined by the stereochemical and physicochemical properties of the sites at which the interaction occurs (binding pockets). Thus, novel functions (binding of novel compounds) generally require the emergence of new binding sites. With the recent breakthroughs in protein structure prediction, the complete set of protein structures has become available. This allowed us to apply computational binding site predictions and to investigate the entirety of all pockets (the “pocketome”) across eleven species from different kingdoms of life, and to study the relationship between the emergence of novel binding sites in relation to increasing sizes of proteomes, i.e., the set of all protein structures in a given species. Our analysis uncovered a sub-linear relationship between the numbers of unique pockets and unique protein structures, suggesting that during evolution, functional diversity shows signs of saturation, which is consistent with other reports, but approached here from the perspective of compound-binding specificities. Our study constitutes the first large-scale investigation of pocketomes based on the now available high-confidence protein structures.

## Linked entities

- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Genes:** TTLL7 (tubulin tyrosine ligase like 7) [NCBI Gene 79739], RHO (rhodopsin) [NCBI Gene 6010] {aka CSNBAD1, OPN2, RP4}, LGALS3 (galectin 3) [NCBI Gene 3958] {aka CBP35, GAL3, GALBP, GALIG, L31, LGALS2}, GPR166P (G protein-coupled receptor 166, pseudogene) [NCBI Gene 442206] {aka GPCR, PGR9}
- **Chemicals:** Histidine (MESH:D006639), Glycan (MESH:D011134), ADP (MESH:D000244), acid (MESH:D000143), ATP (MESH:D000255), heme (MESH:D006418), Arg (MESH:D001120), Amino acids (MESH:D000596), Lipids (MESH:D008055), Lys (MESH:D008239), Monosaccharides (MESH:D009005), Haliloglu (-), Carbohydrates (MESH:D002241), hydrogen (MESH:D006859)
- **Species:** Homo sapiens (human, species) [taxon 9606], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Mycobacterium (genus) [taxon 1763], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12324681/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12324681/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/PMC12324681/full.md

---
Source: https://tomesphere.com/paper/PMC12324681