# LigExtract: Large-scale Automated Identification of Ligands from Protein Structures in the Protein Data Bank

**Authors:** Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes

PMC · DOI: 10.1093/gpbjnl/qzaf018 · Genomics, Proteomics & Bioinformatics · 2025-02-28

## TL;DR

LigExtract is a new tool that automatically identifies ligands in protein structures from the Protein Data Bank, helping researchers in drug discovery.

## Contribution

LigExtract introduces a fully open-source, end-to-end tool for large-scale ligand identification from PDB structures.

## Key findings

- LigExtract processes PDB structures and extracts ligands along with relevant files and logs.
- The tool is freely available on GitHub and handles complex ligand representations in PDB.
- It provides logs to document extraction decisions and flag special cases like covalent ligand binding.

## Abstract

The Protein Data Bank (PDB) is an ever-growing database of three-dimensional macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing their associated ligands are essential for researchers to understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools for large-scale ligand identification fail to address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing. Users simply provide a list of UniProt IDs, and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains interacting with the ligand, and a series of log files. These logs record the decisions made during the ligand extraction process and flag additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is freely available on GitHub (https://github.com/comp-medchem/LigExtract).

## Full-text entities

- **Genes:** PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha) [NCBI Gene 5290] {aka CCM4, CLAPO, CLOVE, CWS5, HMH, MCAP}, PTPMT1 (protein tyrosine phosphatase mitochondrial 1) [NCBI Gene 114971] {aka DUSP23, MOSP, NEDAXBA, PLIP, PNAS-129}, EZH2 (enhancer of zeste 2 polycomb repressive complex 2 subunit) [NCBI Gene 2146] {aka ENX-1, ENX1, EZH2b, KMT6, KMT6A, WVS}, EED (embryonic ectoderm development) [NCBI Gene 8726] {aka COGIS, HEED, WAIT1}, OGT (O-linked N-acetylglucosamine (GlcNAc) transferase) [NCBI Gene 8473] {aka HINCUT-1, HRNT1, MRX106, O-GLCNAC, OGT1, XLID106}, PDB [NCBI Gene 5131], SMC3 (structural maintenance of chromosomes 3) [NCBI Gene 9126] {aka BAM, BMH, CDLS3, CSPG6, HCAP, SMC3L1}, SEC61B (SEC61 translocon subunit beta) [NCBI Gene 10952], JARID2 (jumonji and AT-rich interaction domain containing 2) [NCBI Gene 3720] {aka DIDDF, JMJ}, GTF2IRD1 (GTF2I repeat domain containing 1) [NCBI Gene 9569] {aka BEN, CREAM1, GTF3, MUSTRD1, RBAP2, WBS}, APLP1 (amyloid beta precursor like protein 1) [NCBI Gene 333] {aka APLP}, SLC35B2 (solute carrier family 35 member B2) [NCBI Gene 347734] {aka HLD26, PAPST1, SLL, UGTrel4}, CSNK2A1 (casein kinase 2 alpha 1) [NCBI Gene 1457] {aka CK2A1, CKII, Cka1, Cka2, OCNDS}, F2 (coagulation factor II, thrombin) [NCBI Gene 2147] {aka PT, RPRGL2, THPH1}
- **Diseases:** PRD ID (OMIM:312550), MOAD (MESH:C536496)
- **Chemicals:** macrolide (MESH:D018942), amino acid (MESH:D000596), heparin (MESH:D006493), ADP (MESH:D000244), lipids (MESH:D008055), oligosaccharide (MESH:D009844), peptide (MESH:D010455), 4RDA (-), benzamidine (MESH:C032157), ATP (MESH:D000255), teriparatide (MESH:D019379), Cyclosporin A (MESH:D016572), benzyl chlorocarbonate (MESH:C018241), polymer (MESH:D011108), SEL2711 (MESH:C110650), colchicine (MESH:D003078)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12619641/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12619641/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12619641/full.md

---
Source: https://tomesphere.com/paper/PMC12619641