STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation
Bum Chul Kwon, Ben Shapira, Moshiko Raboh, Shreyans Sethi, Shruti Murarka, Joseph A Morrone, Jianying Hu, Parthasarathy Suryanarayanan

TL;DR
STAR-VAE introduces a scalable Transformer-based variational autoencoder for molecular generation, enabling conditional, property-guided synthesis of drug-like molecules with efficient fine-tuning and high-quality results.
Contribution
It presents a novel Transformer-based latent-variable model trained on large-scale data, with a principled conditional formulation and low-rank adapter fine-tuning for molecular design.
Findings
Achieves state-of-the-art results on benchmark datasets.
Supports both unconditional exploration and property-aware generation.
Effectively shifts docking scores toward stronger predicted binding.
Abstract
The chemical space of drug-like molecules is vast, motivating the development of generative models that must learn broad chemical distributions, enable conditional generation by capturing structure-property representations, and provide fast molecular generation. Meeting the objectives depends on modeling choices, including the probabilistic modeling approach, the conditional generative formulation, the architecture, and the molecular input representation. To address the challenges, we present STAR-VAE (Selfies-encoded, Transformer-based, AutoRegressive Variational Auto Encoder), a scalable latent-variable framework with a Transformer encoder and an autoregressive Transformer decoder. It is trained on 79 million drug-like molecules from PubChem, using SELFIES to guarantee syntactic validity. The latent-variable formulation enables conditional generation: a property predictor supplies a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Protein Structure and Dynamics
