A Deep Generative Model for Fragment-Based Molecule Generation
Marco Podda, Davide Bacciu, Alessio Micheli

TL;DR
This paper introduces a fragment-based deep generative model for molecule creation that improves validity and diversity over traditional string-based models, achieving state-of-the-art results in molecule generation.
Contribution
The work proposes a novel fragment-based language model that enhances validity and diversity in molecule generation, outperforming existing string-based approaches.
Findings
Outperforms other language model-based methods in validity and uniqueness.
Achieves state-of-the-art performance comparable to graph-based models.
Generated molecules maintain properties similar to training data without explicit supervision.
Abstract
Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative approaches addressing the challenge belong to two broad categories, differing in how molecules are represented. One approach encodes molecular graphs as strings of text, and learns their corresponding character-based language model. Another, more expressive, approach operates directly on the molecular graph. In this work, we address two limitations of the former: generation of invalid and duplicate molecules. To improve validity rates, we develop a language model for small molecular substructures called fragments, loosely inspired by the well-known paradigm of Fragment-Based Drug Design. In other words, we generate molecules fragment by fragment, instead of atom by atom. To improve uniqueness rates, we present a frequency-based masking strategy that helps generate molecules with infrequent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Protein Structure and Dynamics
