Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders
Esben Jannik Bjerrum, Boris Sattarov

TL;DR
This paper demonstrates that using heteroencoders with SMILES enumeration significantly improves the chemical relevance and molecular similarity in the latent space of autoencoders, enhancing de novo molecule generation.
Contribution
It introduces a sequence-to-sequence heteroencoder approach with SMILES enumeration that outperforms traditional autoencoders in capturing molecular similarity and relevance.
Findings
Heteroencoder latent space correlates better with molecular similarity.
SMILES enumeration improves chemical relevance of latent vectors.
Increased decoding errors with enumeration can be mitigated with complex architectures.
Abstract
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de-novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here it is shown that the choice of chemical representation, such as SMILES strings, has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks(RNNs) with long short-term memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
