De Novo Drug Design with Joint Transformers
Adam Izdebski, Ewelina Weglarz-Tomczak, Ewa Szczurek, Jakub, M. Tomczak

TL;DR
This paper introduces a Joint Transformer model that combines generative and predictive capabilities for de novo drug design, enabling the creation of novel molecules with optimized target properties.
Contribution
The paper presents a novel joint Transformer architecture that integrates generation and property prediction with shared weights for improved drug design.
Findings
Outperforms existing SMILES-based optimization methods
Generates molecules with enhanced target properties
Effective in exploring novel chemical space
Abstract
De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, Transformer encoder, and a predictor in a joint generative model with shared weights. We formulate a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties and outperforms other SMILES-based optimization methods in de novo drug design.
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
## Strengths 1. Formulating *de novo* drug design as a probablistic BBO problem is a novel perspective. 2. The writing of the paper is smooth, logical and clear, especially the theoretical formulation is solid.
## Weaknesses ### Related works Chemformer (https://iopscience.iop.org/article/10.1088/2632-2153/ac3ffb/pdf) also incorporates a bidirectional encoder and an autoregressive decoder to process SMILES. It seems that the model architectures of JOINT TRANSFORMER and Chemformer are similar, but the authors make no mention of this. ### Experiments on targeted virtual screening (section 4.2) **This part of experiments cannot prove the effectiveness of the algorithm in virtual screening tasks.** The
1. The share weight design of the encoder and decoder enables the model to learn robust representation of molecules for both target prediction and SMILES sequence generation. It can also lead to more computationally efficient model, as claimed by the authors. 2. The design of training the joint transformer using a probability hyperparameter to shift between encoder and decoder mode is interesting. It can also cast influence on future works in similar tasks. 3. The manuscript is written with hi
1. The joint transformer design adds complexity to the training. Despite the advantages mentioned in the Strength section, the three terms (penalty, prediction loss, generative loss) in the loss function needs extra heuristic hyperparameter tuning ($p_{task}$). From the result of Table 1, the choice of $p_{task}$ and the penalty term in loss function will result in trade-off between Validity/FCD and the prediction accuracy of the model. 2. According to Algorithm 2, the probabilistic black-box
Formulate a generic sampling algorithm with theoretical guarantees to guide the generation of novel compounds with this methods.
1. The experiments and benchmarks presented are not very convincing. In targeted virtual screening, this is a typical problem. It would be better to use DUDE or PCBA as benchmark datasets, also the baseline methods are not highly comparable, lacking pretraining or deep learning-based methods. In de novo drug design, it is also necessary to compare with recent deep learning-based state-of-the-art methods, including popular VAE, GAN, diffusion, and flow matching-based models, such as G-SchNet and
1. The paper tackles an important problem and the chosen properties to condition on are also interesting.
1. The novelty is not very strong 2. There are many baselines are missing
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography
MethodsMulti-Head Attention · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Attention Is All You Need · Adam · Residual Connection · Layer Normalization · Softmax
