# Bidirectional reinforcement learning neural network for constrained molecular design

**Authors:** Junan Lin, Jiří Hostaš, Anguang Hu, Hang Hu, Hsu Kiang Ooi, Mohammad Sajjad Ghaemi

PMC · DOI: 10.1038/s41598-025-33443-3 · 2025-12-24

## TL;DR

BiRLNN is a new framework that uses bidirectional reinforcement learning to design drug-like molecules while ensuring syntactic validity and exploring chemical space effectively.

## Contribution

The novel use of bidirectional generation and multi-objective reinforcement learning for constrained molecular design.

## Key findings

- BiRLNN generates fully syntactically valid molecules using Self-Referencing Embedded Strings.
- The bidirectional approach explores more balanced and diverse regions of chemical space compared to unidirectional models.
- Reinforcement learning successfully guides molecule generation toward desirable drug properties with improved metrics.

## Abstract

We present BiRLNN, a bidirectional molecular design framework that combines recurrent neural networks with reinforcement learning to optimize drug-like properties of generated compounds. We examined the use of Self-Referencing Embedded Strings representations, which ensures 100% syntactic validity of generated molecules. By generating molecular sequences in both forward and backward directions, we enabled more balanced exploration of chemical space while maintaining constraint requirements during molecular design. To guide generation towards desirable pharmacological targets, we implement a multi-objective reward function based on quantitative estimate of drug-likeness and synthetic accessibility, and apply policy gradient-based reinforcement learning for fine-tuning. We demonstrate that our bidirectional model covers the full constrained chemical space compared to unidirectional ones using pharmaceutically relevant fragments, allowing it to explore regions containing molecules unreachable by the latter. Moreover, the reinforcement learning process successfully steers the constrained generation process toward desirable compound classes with improved reward metrics. Our results demonstrate that BiRLNN offers a robust and flexible strategy for navigating chemical space in multi-objective drug design tasks.

## Full-text entities

- **Chemicals:** BiRLNN (-)

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12834971/full.md

---
Source: https://tomesphere.com/paper/PMC12834971