Seq2Mol: Automatic design of de novo molecules conditioned by the target protein sequences through deep neural networks
Ahmadreza Ghanbarpour, Markus A. Lill

TL;DR
This paper introduces Seq2Mol, a deep learning model conditioned on protein sequences to generate de novo molecules relevant to specific targets, improving diversity and similarity to known binders for drug discovery.
Contribution
The method uniquely integrates protein sequence embeddings with a generative model to produce target-specific molecules, advancing de novo drug design techniques.
Findings
Generated molecules are structurally diverse and target-relevant.
Produced compounds show higher similarity to known binders than random molecules.
Compounds exhibit reasonable synthesizability and drug-likeness scores.
Abstract
De novo design of molecules has recently enjoyed the power of generative deep neural networks. Current approaches aim to generate molecules either resembling the properties of the molecules of the training set or molecules that are optimized with respect to specific physicochemical properties. None of the methods generates molecules specific to a target protein. In the approach presented here, we introduce a method which is conditioned on the protein target sequence to generate de novo molecules that are relevant to the target. We use an implementation adapted from Google's "Show and Tell" image caption generation method, to generate SMILES strings of molecules from protein sequence embeddings generated by a deep bi-directional language model ELMo. ELMo is used to generate contextualized embedding vectors of the protein sequence. Using reinforcement learning, the trained model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Chemical Synthesis and Analysis · Protein Structure and Dynamics
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo
