From Tokens to Blocks: A Block-Diffusion Perspective on Molecular Generation
Qianwei Yang, Dong Xu, Zhangfan Yang, Sisi Yuan, Zexuan Zhu, Jianqiang Li, Junkai Ji

TL;DR
SoftMol introduces a novel block-diffusion framework with soft fragments and target-aware search, significantly enhancing molecular generation quality, diversity, and efficiency for drug discovery.
Contribution
It proposes SoftMol, a unified approach combining soft fragment representation, block-diffusion modeling, and target-aware search for improved molecular design.
Findings
Achieves 100% chemical validity in generated molecules.
Improves binding affinity by 9.7%.
Increases molecular diversity 2-3 times and speeds up inference 6.6 times.
Abstract
Drug discovery can be viewed as a combinatorial search over an immense chemical space, motivating the development of deep generative models for de novo molecular design. Among these, GPT-based molecular language models (MLM) have shown strong molecular design performance by learning chemical syntax and semantics from large-scale data. However, existing MLMs face two fundamental limitations: they inadequately capture the graph-structured nature of molecules when formulated as next-token prediction problems, and they typically lack explicit mechanisms for target-aware generation. Here, we propose SoftMol, a unified framework that co-designs molecular representation, model architecture, and search strategy for target-aware molecular generation. SoftMol introduces soft fragments, a rule-free block representation of SMILES that enables diffusion-native modeling, and develops SoftBD, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
