When Molecular GAN Meets Byte-Pair Encoding

Huidong Tang; Chen Li; Yasuhiko Morimoto

arXiv:2409.19740·cs.LG·October 1, 2024

When Molecular GAN Meets Byte-Pair Encoding

Huidong Tang, Chen Li, Yasuhiko Morimoto

PDF

Open Access

TL;DR

This paper presents a novel molecular GAN that combines byte-pair encoding tokenization with reinforcement learning to improve the generation of valid, diverse, and novel drug-like molecules, addressing limitations of traditional tokenizers.

Contribution

It introduces a molecular GAN with byte-pair encoding tokenization and reinforcement learning, enhancing molecular generation quality and computational efficiency.

Findings

01

High validity and diversity in generated molecules

02

Superior novelty compared to baseline models

03

Effective reward mechanisms improve generation performance

Abstract

Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing