# ActSeek: fast and accurate search algorithm of active sites in alphafold database

**Authors:** Sandra Castillo, Osmo Henri Samuli Ollila

PMC · DOI: 10.1093/bioinformatics/btaf424 · Bioinformatics · 2025-07-26

## TL;DR

ActSeek is a fast and accurate tool that searches for proteins with similar active sites in the Alphafold database, helping find enzymes for biodegradable plastics and drug off-targets.

## Contribution

Introduces ActSeek, a novel computer vision-inspired algorithm for searching active site similarities in structural databases.

## Key findings

- ActSeek can identify enzymes for producing or degrading biodegradable plastics.
- The tool helps find potential off-targets for common drug molecules.
- ActSeek is implemented for fast and accurate mining of the Alphafold database.

## Abstract

Finding proteins with specific functions by mining modern databases can potentially lead to substantial advancements in wide range of fields, from medicine and biotechnology to material science. Currently available algorithms enable mining of proteins based on their sequence or structure. However, activities of many proteins, such as enzymes and drug targets, are dictated by active site residues and their surroundings rather than the overall structure or sequence of a protein.

We introduce ActSeek—a computer vision-inspired fast program—that searches structural databases for proteins with active sites similar to the seed protein. ActSeek is implemented to mine proteins with desired active site environments from the Alphafold database. The potential of ActSeek to find innovative solutions to the world’s most pressing challenges is demonstrated by finding enzymes that may be used to produce biodegradable plastics or degrade plastics, as well as potential off-targets for common drug molecules.

ActSeek source code is available in https://github.com/vttresearch/ActSeek under Non-Commercial License Agreement.

## Full-text entities

- **Genes:** FLT1 (fms related receptor tyrosine kinase 1) [NCBI Gene 2321] {aka FLT, FLT-1, VEGFR-1, VEGFR1}, FLT4 (fms related receptor tyrosine kinase 4) [NCBI Gene 2324] {aka CHTD7, FLT-4, FLT41, LMPH1A, LMPHM1, PCL}, HTR1B (5-hydroxytryptamine receptor 1B) [NCBI Gene 3351] {aka 5-HT-1B, 5-HT-1D-beta, 5-HT1B, 5-HT1DB, HTR1D2, HTR1DB}, ADRB2 (adrenoceptor beta 2) [NCBI Gene 154] {aka ADRB2R, ADRBR, ARB2, B2AR, BAR, BETA2AR}, GUCY2F (guanylate cyclase 2F, retinal) [NCBI Gene 2986] {aka CYGF, GC-F, GUC2DL, GUC2F, RETGC-2, ROS-GC2}, TAAR9 (trace amine associated receptor 9) [NCBI Gene 134860] {aka TA3, TAR3, TAR9, TRAR3}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, NPR1 (natriuretic peptide receptor 1) [NCBI Gene 4881] {aka ANP-A, ANPRA, ANPa, GC-A, GUC2A, GUCY2A}, STK10 (serine/threonine kinase 10) [NCBI Gene 6793] {aka LOK, PRO2729}, TAAR2 (trace amine associated receptor 2) [NCBI Gene 9287] {aka GPR58, taR-2}, KDR (kinase insert domain receptor) [NCBI Gene 3791] {aka CD309, FLK1, VEGFR, VEGFR2}, JAK2 (Janus kinase 2) [NCBI Gene 3717] {aka JTK10}, SRMS (src-related kinase lacking C-terminal regulatory tyrosine and N-terminal myristylation sites) [NCBI Gene 6725] {aka C20orf148, PTK70, SRM, dJ697K14.1}
- **Diseases:** cardiovascular diseases (MESH:D002318), cardiac arrhythmia (MESH:D001145), myocardial infarction (MESH:D009203), anxiety (MESH:D001007), hypertension (MESH:D006973), cancer (MESH:D009369), retinal tear (MESH:D012167)
- **Chemicals:** acids (MESH:D000143), Amino acids (MESH:D000596), Erlotinib (MESH:D000069347), Sorafenib (MESH:D000077157), carbons (MESH:D002244), PHA (MESH:D054813), PET (MESH:D011093), ActSeek (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Pseudideonella sakaiensis (species) [taxon 1547922], Chromobacterium (genus) [taxon 535]
- **Mutations:** asparagine at position 536, isoleucine instead of leucine, tyrosine for position 185

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12343037/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12343037/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12343037/full.md

---
Source: https://tomesphere.com/paper/PMC12343037