Can LLMs Generate Diverse Molecules? Towards Alignment with Structural   Diversity

Hyosoon Jang; Yunhui Jang; Jaehyung Kim; Sungsoo Ahn

arXiv:2410.03138·cs.LG·February 18, 2025·3 cites

Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity

Hyosoon Jang, Yunhui Jang, Jaehyung Kim, Sungsoo Ahn

PDF

Open Access

TL;DR

This paper introduces a novel fine-tuning and reinforcement learning approach for large language models to generate structurally diverse molecules, addressing the limitation of current models that tend to produce similar molecules, thereby enhancing drug discovery potential.

Contribution

The paper presents a new method combining supervised fine-tuning and reinforcement learning to improve the structural diversity of molecules generated by LLMs.

Findings

01

Enhanced molecular diversity compared to existing methods

02

Effective autoregressive generation of diverse molecules

03

Improved alignment of textual and structural diversity

Abstract

Recent advancements in large language models (LLMs) have demonstrated impressive performance in molecular generation, which offers potential to accelerate drug discovery. However, the current LLMs overlook a critical requirement for drug discovery: proposing a diverse set of molecules. This diversity is essential for improving the chances of finding a viable drug, as it provides alternative molecules that may succeed where others fail in real-world validations. Nevertheless, the LLMs often output structurally similar molecules. While decoding schemes like diverse beam search may enhance textual diversity, this often does not align with molecular structural diversity. In response, we propose a new method for fine-tuning molecular generative LLMs to autoregressively generate a set of structurally diverse molecules, where each molecule is generated by conditioning on the previously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChemical Synthesis and Analysis

MethodsSparse Evolutionary Training · ALIGN