# Probabilistic hypergraph grammars for efficient molecular optimization

**Authors:** Egor Kraev, Mark Harley

arXiv: 1906.01845 · 2019-06-06

## TL;DR

This paper introduces a probabilistic hypergraph grammar-based method combined with reinforcement learning to efficiently optimize molecules, achieving competitive results with limited training on standard benchmarks.

## Contribution

It develops a novel probabilistic hypergraph grammar from molecular data and integrates it with a Transformer-based policy gradient method for efficient molecular optimization.

## Key findings

- Conditional priors improve molecular distribution accuracy
- The method reaches optimal benchmark scores faster without pre-training
- Achieves competitive results with limited training steps

## Abstract

We present an approach to make molecular optimization more efficient. We infer a hypergraph replacement grammar from the ChEMBL database, count the frequencies of particular rules being used to expand particular nonterminals in other rules, and use these as conditional priors for the policy model. Simulating random molecules from the resulting probabilistic grammar, we show that conditional priors result in a molecular distribution closer to the training set than using equal rule probabilities or unconditional priors. We then treat molecular optimization as a reinforcement learning problem, using a novel modification of the policy gradient algorithm - batch-advantage: using individual rewards minus the batch average reward to weight the log probability loss. The reinforcement learning agent is tasked with building molecules using this grammar, with the goal of maximizing benchmark scores available from the literature. To do so, the agent has policies both to choose the next node in the graph to expand and to select the next grammar rule to apply. The policies are implemented using the Transformer architecture with the partially expanded graph as the input at each step. We show that using the empirical priors as the starting point for a policy eliminates the need for pre-training, and allows us to reach optima faster. We achieve competitive performance on common benchmarks from the literature, such as penalized logP and QED, with only hundreds of training steps on a budget GPU instance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01845/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01845/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1906.01845/full.md

---
Source: https://tomesphere.com/paper/1906.01845