De Novo Drug Design with Joint Transformers

Adam Izdebski; Ewelina Weglarz-Tomczak; Ewa Szczurek; Jakub; M. Tomczak

arXiv:2310.02066·cs.LG·December 5, 2023

De Novo Drug Design with Joint Transformers

Adam Izdebski, Ewelina Weglarz-Tomczak, Ewa Szczurek, Jakub, M. Tomczak

PDF

Open Access 4 Reviews

TL;DR

This paper introduces a Joint Transformer model that combines generative and predictive capabilities for de novo drug design, enabling the creation of novel molecules with optimized target properties.

Contribution

The paper presents a novel joint Transformer architecture that integrates generation and property prediction with shared weights for improved drug design.

Findings

01

Outperforms existing SMILES-based optimization methods

02

Generates molecules with enhanced target properties

03

Effective in exploring novel chemical space

Abstract

De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, Transformer encoder, and a predictor in a joint generative model with shared weights. We formulate a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties and outperforms other SMILES-based optimization methods in de novo drug design.

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

## Strengths 1. Formulating *de novo* drug design as a probablistic BBO problem is a novel perspective. 2. The writing of the paper is smooth, logical and clear, especially the theoretical formulation is solid.

Weaknesses

## Weaknesses ### Related works Chemformer (https://iopscience.iop.org/article/10.1088/2632-2153/ac3ffb/pdf) also incorporates a bidirectional encoder and an autoregressive decoder to process SMILES. It seems that the model architectures of JOINT TRANSFORMER and Chemformer are similar, but the authors make no mention of this. ### Experiments on targeted virtual screening (section 4.2) **This part of experiments cannot prove the effectiveness of the algorithm in virtual screening tasks.** The

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The share weight design of the encoder and decoder enables the model to learn robust representation of molecules for both target prediction and SMILES sequence generation. It can also lead to more computationally efficient model, as claimed by the authors. 2. The design of training the joint transformer using a probability hyperparameter to shift between encoder and decoder mode is interesting. It can also cast influence on future works in similar tasks. 3. The manuscript is written with hi

Weaknesses

1. The joint transformer design adds complexity to the training. Despite the advantages mentioned in the Strength section, the three terms (penalty, prediction loss, generative loss) in the loss function needs extra heuristic hyperparameter tuning ($p_{task}$). From the result of Table 1, the choice of $p_{task}$ and the penalty term in loss function will result in trade-off between Validity/FCD and the prediction accuracy of the model. 2. According to Algorithm 2, the probabilistic black-box

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

Formulate a generic sampling algorithm with theoretical guarantees to guide the generation of novel compounds with this methods.

Weaknesses

1. The experiments and benchmarks presented are not very convincing. In targeted virtual screening, this is a typical problem. It would be better to use DUDE or PCBA as benchmark datasets, also the baseline methods are not highly comparable, lacking pretraining or deep learning-based methods. In de novo drug design, it is also necessary to compare with recent deep learning-based state-of-the-art methods, including popular VAE, GAN, diffusion, and flow matching-based models, such as G-SchNet and

Reviewer 04Rating 3· reject, not good enoughConfidence 4

Strengths

1. The paper tackles an important problem and the chosen properties to condition on are also interesting.

Weaknesses

1. The novelty is not very strong 2. There are many baselines are missing

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography

MethodsMulti-Head Attention · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Attention Is All You Need · Adam · Residual Connection · Layer Normalization · Softmax