MAGNet: Motif-Agnostic Generation of Molecules from Shapes
Leon Hetzel, Johanna Sommer, Bastian Rieck, Fabian Theis and, Stephan G\"unnemann

TL;DR
MAGNet is a graph-based molecule generation model that creates abstract shapes first, enabling flexible and diverse molecule synthesis beyond known motifs, outperforming existing methods on benchmarks.
Contribution
It introduces a novel shape-based molecule generation approach that enhances flexibility and diversity, overcoming motif dependency limitations of prior models.
Findings
MAGNet outperforms other graph-based models on standard benchmarks.
It produces molecules with more topologically distinct structures.
It generates diverse atom and bond assignments.
Abstract
Recent advances in machine learning for molecules exhibit great potential for facilitating drug discovery from in silico predictions. Most models for molecule generation rely on the decomposition of molecules into frequently occurring substructures (motifs), from which they generate novel compounds. While motif representations greatly aid in learning molecular distributions, such methods struggle to represent substructures beyond their known motif set. To alleviate this issue and increase flexibility across datasets, we propose MAGNet, a graph-based model that generates abstract shapes before allocating atom and bond types. To this end, we introduce a novel factorisation of the molecules' data distribution that accounts for the molecules' global context and facilitates learning adequate assignments of atoms and bonds onto shapes. Despite the added complexity of shape abstractions,…
Peer Reviews
Decision·Submitted to ICLR 2024
This paper has an original design of a pre-training model that makes some sense conceptually. The demonstration of conditional design prompted by a fragment or a shape is interesting as are the interpolation examples in the appendix. Unfortunately, the bulk of the paper does not come close to what I'd expect from an ICLR publication. It is hard to make a case for the significance of this work; the benchmarks are rather weak and to a degree uninteresting for practical drug discovery as there are
If I understand correctly, then what the authors call "shape" is the so-called Murcko scaffold of individual fragments. Murcko scaffold basically is: turn every atom into carbon and every bond into a single one. It would have helped if the authors clarified this point early on. The listed benchmark snippets in table 1 are not super relevant to drug discovery and the comparison models are weak compared to what one could nowadays train on a single GPU during a single week on a large dataset. It
- Factorization of molecules into abstract (untyped) shapes is a reasonable and smart idea. Chemistry-agnostic shapes provide higher-level abstraction than conventional motifs, and help the representation learning model perceive molecular shape distributions. - Intuitively, MAGNet effectively decouples modeling structural and chemical diversity into two consecutive generation steps, hence the corresponding neural network only needs to capture distributions in a subspace instead of their product
See questions.
Innovation: The paper presents a novel approach to molecule generation, moving away from the traditional motif-based methods. The introduction of abstract shapes as an intermediate step in the generation process is a significant innovation, potentially leading to more flexible and diverse molecule generation. Comprehensive Methodology: The authors provide a detailed explanation of the MAGNet model, its generation process, and the underlying methodology. The factorisation of the data distributio
1. Insufficient Comparative Analysis: The paper falls short in providing a comparative analysis with diffusion-based models, which are crucial in the domain of molecule generation. This lack of comparison might lead to an incomplete evaluation of MAGNet, as readers are left without a clear understanding of how the proposed model stands against these advanced alternatives. A thorough comparison, highlighting the strengths, weaknesses, and performance differences, would significantly enhance the p
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Various Chemistry Research Topics
