# Tuning Reinforcement Learning Parameters for Cluster Selection to Enhance Evolutionary Algorithms

**Authors:** Nathan Villavicencio, Michael N. Groves

PMC · DOI: 10.1021/acsengineeringau.3c00068 · ACS Engineering Au · 2024-04-16

## TL;DR

This paper introduces a reinforcement learning method to improve evolutionary algorithms for finding optimal molecular structures by tuning cluster selection parameters.

## Contribution

The novel contribution is a reinforcement learning-based system for cluster selection in evolutionary algorithms, using four tunable parameters to balance exploration and exploitation.

## Key findings

- Parameters MFavOvrAll-A and Select-D significantly impact evolutionary algorithm performance.
- Balancing MFavOvrAll-A and Select-D optimizes exploration vs exploitation trade-off.
- The proposed method outperforms unclustered evolutionary algorithms in quinoline-like structure searches.

## Abstract

The ability to find
optimal molecular structures with desired properties
is a popular challenge, with applications in areas such as drug discovery.
Genetic algorithms are a common approach to global minima molecular
searches due to their ability to search large regions of the energy
landscape and decrease computational time via parallelization. In
order to decrease the amount of unstable intermediate structures being
produced and increase the overall efficiency of an evolutionary algorithm,
clustering was introduced in multiple instances. However, there is
little literature detailing the effects of differentiating the selection
frequencies between clusters. In order to find a balance between exploration
and exploitation in our genetic algorithm, we propose a system of
clustering the starting population and choosing clusters for an evolutionary
algorithm run via a dynamic probability that is dependent on the fitness
of molecules generated by each cluster. We define four parameters,
MFavOvrAll-A, MFavClus-B, NoNewFavClus-C, and Select-D, that correspond
to a reward for producing the best structure overall, a reward for
producing the best structure in its own cluster, a penalty for not
producing the best structure, and a penalty based on the selection
ratio of the cluster, respectively. A reward increases the probability
of a cluster’s future selection, while a penalty decreases
it. In order to optimize these four parameters, we used a Gaussian
distribution to approximate the evolutionary algorithm performance
of each cluster and performed a grid search for different parameter
combinations. Results show parameter MFavOvrAll-A (rewarding clusters
for producing the best structure overall) and parameter Select-D (appearance
penalty) have a significantly larger effect than parameters MFavClus-B
and NoNewFavClus-C. In order to produce the most successful models,
a balance between MFavOvrAll-A and Select-D must be made that reflects
the exploitation vs exploration trade-off often seen in reinforcement
learning algorithms. Results show that our reinforcement-learning-based
method for selecting clusters outperforms an unclustered evolutionary
algorithm for quinoline-like structure searches.

## Linked entities

- **Chemicals:** quinoline (PubChem CID 7047)

## Full-text entities

- **Chemicals:** quinoline (MESH:C037219)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11342372/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11342372/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC11342372/full.md

---
Source: https://tomesphere.com/paper/PMC11342372