Improving Data Representation of Metalloproteins in the Protein Data Bank
Alison Biester, Chenghua Shao, Zukang Feng, Ezra Peisach, Jasmine Y. Young, Stephen K. Burley

TL;DR
This paper discusses efforts to improve the accuracy of metal-containing protein data in the Protein Data Bank to better support biological and biochemical research.
Contribution
The paper introduces a remediation project to correct metal ligand data and enhance metalloprotein annotation using community-developed software.
Findings
Metalloproteins make up over one-third of PDB structures but face challenges in consistent chemical description.
A new data model and software tools are being used to correct and annotate metal ligand data across the archive.
Accurate oxidation states and coordination geometries are critical for understanding metal function in proteins.
Abstract
The Protein Data Bank (PDB) was established in 1971 as the first open-access digital data resource in biology, initially comprising just seven X- ray crystal structures of proteins. Today, the archive houses more than 225,000 experimentally-determined three-dimensional (3D) structures of biological macromolecules that are freely used by many millions of PDB data consumers worldwide. This wealth of information serves as a cornerstone for research and education endeavors across fundamental biology, biomedicine, biotechnology, and the energy sciences. The Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) includes five core members (RCSB PDB, PDBe, PDBj, BMRB, and EMDB) and one associate member (PDBc). The wwPDB jointly manages the PDB, EMDB, and BMRB core archives, which adhere to the FAIR (Findability, Accessibility, Interoperability, Reusability) principles emblematic of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Gene expression and cancer classification · Machine Learning in Bioinformatics
