Chain-of-Generation: Progressive Latent Diffusion for Text-Guided Molecular Design
Lingxiao Li, Haobo Zhang, Bin Chen, Jiayu Zhou

TL;DR
This paper introduces Chain-of-Generation (CoG), a multi-stage latent diffusion framework that progressively incorporates semantic segments from text prompts to improve molecular design, addressing limitations of one-shot conditioning.
Contribution
The paper proposes a training-free, multi-stage diffusion method that decomposes prompts into semantic segments, guiding molecule generation more effectively than existing one-shot approaches.
Findings
CoG achieves higher semantic alignment with complex prompts.
It produces more diverse and controllable molecules.
The method offers transparent insights into the generation process.
Abstract
Text-conditioned molecular generation aims to translate natural-language descriptions into chemical structures, enabling scientists to specify functional groups, scaffolds, and physicochemical constraints without handcrafted rules. Diffusion-based models, particularly latent diffusion models (LDMs), have recently shown promise by performing stochastic search in a continuous latent space that compactly captures molecular semantics. Yet existing methods rely on one-shot conditioning, where the entire prompt is encoded once and applied throughout diffusion, making it hard to satisfy all the requirements in the prompt. We discuss three outstanding challenges of one-shot conditioning generation, including the poor interpretability of the generated components, the failure to generate all substructures, and the overambition in considering all requirements simultaneously. We then propose three…
Peer Reviews
Decision·Submitted to ICLR 2026
+ The paper introduces a novel and conceptually elegant chain-of-thought–inspired framework for progressive molecular diffusion generation. + The proposed method is training-free and easily integrates with existing latent diffusion models, improving interpretability and controllability. + Comprehensive experiments and chemically meaningful graph-based metrics demonstrate consistent performance gains over strong baselines.
+ Although the paper identifies three main limitations of one-shot conditioning, the proposed CoG framework does not show sufficiently strong improvements in addressing these issues. + The evaluation focuses mainly on structural fidelity without testing functional or physicochemical properties of generated molecules. + The method relies heavily on external LLM-based prompt segmentation, which may introduce instability or semantic errors. + Reported performance gains over GraphLDM are modest and
Clear and well-written. The paper is concise, logically structured, and easy to follow. Figure 1 nicely illustrates the staged conditioning concept. - Good motivation and insight. The work clearly identifies the limitations of one-shot text conditioning for compositional prompts. - Practical contribution. CoG is a plug-in inference strategy that can be used with any pre-trained conditional diffusion model, without architectural change. - Conceptual analogy. This is an inference-time compositi
- Scope of novelty. Per my understanding, CoG is a new inference strategy rather than a new model or architecture. Its novelty lies in orchestrating sequential conditioning during denoising—conceptually related to coarse-to-fine or editing-based sampling in diffusion literature. Clarifying that distinction would help set the right expectations for readers. - Training vs. inference confusion and possible overfitting. The GraphLDM backbone is fine-tuned (post-aligned) on PubChem / ChEBI-20, and C
- The paper presents a training-free Chain-of-Generation with cumulative prompts and mid-noise restarts that has the potential to be easily integrated with existing graph diffusion backbones. - The authors pointed out the limitations of SMILES-based metrics and utilized graph-level metrics (MACCS Tanimoto similarity), consistent gains on ChEBI-20/PubChem with 100% validity, with ablation studies on CoG planning.
Contribution: - The demonstrated effectiveness of the proposed methodology appears constrained to compositional text prompts that mix coarse and fine structural descriptions; it is unclear whether the approach remains effective for richer and more complex textual conditions (e.g., biochemical properties). - The performance gains attributable to CoG seem modest; the results suggest that contrastive alignment contributes a substantially larger portion of the improvement. Motivation: - The overal
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Topic Modeling
