DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Mahsa Sheikholeslami; Navid Mazrouei; Yousof Gheisari; Afshin Fasihi; Matin Irajpour; Ali Motahharynia

arXiv:2411.14157·q-bio.QM·August 27, 2025·2 cites

DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Mahsa Sheikholeslami, Navid Mazrouei, Yousof Gheisari, Afshin Fasihi, Matin Irajpour, Ali Motahharynia

PDF

Open Access 4 Repos

TL;DR

DrugGen enhances drug discovery by fine-tuning a transformer-based generative model with reinforcement learning feedback, significantly improving molecule validity, binding affinity, and potential for drug repositioning.

Contribution

It introduces DrugGen, a novel model that combines fine-tuning on approved drugs and reinforcement learning to generate high-quality, valid molecules with better binding affinity predictions.

Findings

01

Achieved 100% valid structure generation, surpassing DrugGPT's 95.5%.

02

Produced molecules with higher predicted binding affinities.

03

Demonstrated effective docking scores for target proteins.

Abstract

Traditional drug design faces significant challenges due to inherent chemical and biological complexities, often resulting in high failure rates in clinical trials. Deep learning advancements, particularly generative models, offer potential solutions to these challenges. One promising algorithm is DrugGPT, a transformer-based model, that generates small molecules for input protein sequences. Although promising, it generates both chemically valid and invalid structures and does not incorporate the features of approved drugs, resulting in time-consuming and inefficient drug discovery. To address these issues, we introduce DrugGen, an enhanced model based on the DrugGPT structure. DrugGen is fine-tuned on approved drug-target interactions and optimized with proximal policy optimization. By giving reward feedback from protein-ligand binding affinity prediction using pre-trained transformers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax