CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis
Runmin Jiang, Genpei Zhang, Yuntian Yang, Siqi Wu, Minhao Wu, Wanyue Feng, Yizhou Zhao, Xi Xiao, Xiao Wang, Tianyang Wang, Xingjian Li, Muyuan Chen, Min Xu

TL;DR
CryoCCD introduces a novel conditional diffusion framework combined with biophysical modeling to generate realistic cryo-EM micrographs, improving data synthesis for structural biology applications.
Contribution
It presents the first conditional cycle-consistent diffusion model integrated with biophysical modeling for cryo-EM data synthesis, addressing data scarcity and noise realism.
Findings
Generates structurally faithful cryo-EM micrographs
Enhances particle picking and pose estimation accuracy
Outperforms existing synthesis methods in realism and generalization
Abstract
Single-particle cryo-electron microscopy (cryo-EM) has become a cornerstone of structural biology, enabling near-atomic resolution analysis of macromolecules through advanced computational methods. However, the development of cryo-EM processing tools is constrained by the scarcity of high-quality annotated datasets. Synthetic data generation offers a promising alternative, but existing approaches lack thorough biophysical modeling of heterogeneity and fail to reproduce the complex noise observed in real imaging. To address these limitations, we present CryoCCD, a synthesis framework that unifies versatile biophysical modeling with the first conditional cycle-consistent diffusion model tailored for cryo-EM. The biophysical engine provides multi-functional generation capabilities to capture authentic biological organization, and the diffusion model is enhanced with cycle consistency and…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The proposed strategy is reasonable for improving data-driven cryo-EM analysis. - The paper includes comparisons with multiple baseline methods, demonstrating effort toward comprehensive evaluation.
- Incremental novelty. The overall synthetic data generation framework closely follows CryoGEM, with the primary differences being the use of a larger molecular library and the replacement of the GAN generator with a standard diffusion model (partially). Both modifications seem to be straightforward extensions. - Annotation and applicability limitations. The proposed simulation pipeline requires annotations of molecular orientations and particle-type-specific priors, increasing annotation cost
- To my knowledge, this is the first diffusion framework to learn the noise generation in cryo-EM. Noise model in cryo-EM is crucial the the downstream reconstruction but also very difficult to model due to the complexity of the experimental data. - The results, including the visual quality, common CV metrics, and the metrics for particle picking and pose accuracy are quite impressive compared to the baselines.
- Synthesizing realistic cryo-EM micrographs is an interesting task, especially the biggest challenges in the computational cryo-EM include the lack of ground truth and the gap between real and synthetic data. However, in my opinion, the goal of this task is not actually generating "real-looking" micrographs, but to demonstrate how does better synthetic data benefit real problems in cryo-EM data processing. The authors indeed show the benefit of CryoCCD in particle picking and pose estimation. H
1. **Originality**: This paper builds upon the foundational work of CryoGEM, but its key innovation lies in the introduction of the CycleDiffusion approach. By combining the power of diffusion models with cycle consistency, the authors present a more robust framework for generating realistic cryo-EM micrographs. The novel integration of diffusion loss and cycle loss enhances the method’s stability and performance, especially in preserving structural fidelity while modeling complex noise. Moreove
1. **Clarity and Methodological Ambiguity**: Several parts of the paper lack clarity and precise definition, especially in Section 3 and Section 4. The role of the mesh extracted from isosurfaces is never clearly explained—although the paper mentions mesh simplification in the Multi-Scale Volume Modeling stage, it is unclear how or whether the mesh is used in the final projection process. Similarly, the mask generation process is never defined; Section 4 directly introduces masks as inputs witho
This paper is well motivated and presents a credible direction toward physics-grounded generative modeling for cryo-EM data. The framework coherently integrates a biophysical simulator with a cycle-consistent conditional diffusion model, showing strong design consistency. Evaluation goes beyond visual fidelity to include practical downstream tasks (particle picking and reconstruction), supported by clear ablations and generalization tests.
1. No continuous conformational heterogeneity: The dataset generation pipeline relies on discrete PDB atomic models, without modeling continuous structural variability. As a result, the framework lacks datasets that capture smooth conformational transitions, which are critical for evaluating heterogeneous reconstruction performance. 2. Unspecified feature extractor for FID: Although FID scores are reported extensively, the paper does not describe the feature extractor used (architecture, traini
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic and Electromagnetic Effects
MethodsContrastive Learning · Diffusion
