# Sample efficient reinforcement learning with active learning for molecular design

**Authors:** Michael Dodds, Jeff Guo, Thomas Löhr, Alessandro Tibo, Ola Engkvist, Jon Paul Janet

PMC · DOI: 10.1039/d3sc04653b · 2024-02-08

## TL;DR

This paper introduces an active learning system combined with reinforcement learning to accelerate molecular design, significantly reducing the computational effort needed to find high-quality molecules.

## Contribution

The novel RL–AL approach improves sample efficiency in molecular design by integrating active learning with reinforcement learning.

## Key findings

- RL–AL achieves a 5–66-fold increase in hits generated for a fixed oracle budget.
- The method reduces computational time by 4–64-fold to find a specific number of hits.
- Compounds from RL–AL show enriched multi-parameter scoring without reduced diversity.

## Abstract

Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL–AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5–66-fold increase in hits generated for a fixed oracle budget and a 4–64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL–AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.

Active learning accelerates the design of molecules during generative reinforcement learning by creating surrogate models of expensive reward functions, obtaining a 4- to 64-fold reduction in computational effort per hit.

## Full-text entities

- **Diseases:** AL (MESH:D007859)
- **Chemicals:** hydrogen (MESH:D006859), sulfonamide (MESH:D013449), halogens (MESH:D006219), SC-558 (MESH:C102643), ADV (-)
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10935729/full.md

---
Source: https://tomesphere.com/paper/PMC10935729