When Molecular GAN Meets Byte-Pair Encoding
Huidong Tang, Chen Li, Yasuhiko Morimoto

TL;DR
This paper presents a novel molecular GAN that combines byte-pair encoding tokenization with reinforcement learning to improve the generation of valid, diverse, and novel drug-like molecules, addressing limitations of traditional tokenizers.
Contribution
It introduces a molecular GAN with byte-pair encoding tokenization and reinforcement learning, enhancing molecular generation quality and computational efficiency.
Findings
High validity and diversity in generated molecules
Superior novelty compared to baseline models
Effective reward mechanisms improve generation performance
Abstract
Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing
