# A labeled dataset for AI-based cryo-EM map enhancement

**Authors:** Nabin Giri, Xiao Chen, Liguo Wang, Jianlin Cheng

PMC · DOI: 10.1016/j.csbj.2025.06.041 · Computational and Structural Biotechnology Journal · 2025-06-30

## TL;DR

This paper introduces a labeled dataset to improve AI methods for enhancing cryo-EM maps, which are used to study molecular structures at high resolution.

## Contribution

The novelty is the creation of a standardized, open-source dataset with labeled cryo-EM maps for AI benchmarking and training.

## Key findings

- The dataset includes 650 high-resolution cryo-EM maps paired with three types of label maps.
- Label maps show significant resolution improvements compared to original experimental maps.
- The dataset supports AI development for better cryo-EM density map interpretation.

## Abstract

Cryogenic electron microscopy (cryo-EM) has transformed structural biology by enabling near atomic resolution imaging of macromolecular complexes. However, cryo-EM density maps suffer from intrinsic noise arising from structural sources, shot noise, and digital recording, which complicates accurate model building. While various methods for denoising cryo-EM density maps exist, there is a lack of standardized datasets for benchmarking artificial intelligence (AI) approaches. Here, we present an open-source dataset for cryo-EM density map denoising comprising 650 high-resolution (1-4 Å) experimental maps paired with three types of generated label maps: regression maps capturing idealized density distributions, binary classification maps distinguishing structural elements from background, and atom-type classification maps. Each map is standardized to 1 Å voxel size and validated through Fourier Shell Correlation analysis, demonstrating substantial resolution improvements in label maps compared to experimental maps. This resource bridges the gap between structural biology and artificial intelligence communities, allowing researchers to develop and benchmark innovative methods for enhancing cryo-EM density maps.

•A valuable dataset for training and evaluating AI models to enhance cryo-EM density maps.•Enhanced maps allow more accurate interpretation of molecular structures and visualization of structural features.•Labels capture all reference atomic components, including complex elements like rRNA.•Neighbor-atom labeling overcomes limitations in precise atom-level matching.•This work bridges AI and structural biology, advancing cryo-EM-based structure determination.

A valuable dataset for training and evaluating AI models to enhance cryo-EM density maps.

Enhanced maps allow more accurate interpretation of molecular structures and visualization of structural features.

Labels capture all reference atomic components, including complex elements like rRNA.

Neighbor-atom labeling overcomes limitations in precise atom-level matching.

This work bridges AI and structural biology, advancing cryo-EM-based structure determination.

## Full-text entities

- **Genes:** CA1 (carbonic anhydrase 1) [NCBI Gene 759] {aka CA-I, CAB, Car1, HEL-S-11}, CNR2 (cannabinoid receptor 2) [NCBI Gene 1269] {aka CB-2, CB2, CX5}
- **Chemicals:** carbon (MESH:D002244), glycans (MESH:D011134), nitrogen (MESH:D009584), oxygen (MESH:D010100), ice (MESH:D007053), lipids (MESH:D008055)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12271583/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12271583/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12271583/full.md

---
Source: https://tomesphere.com/paper/PMC12271583