Beam Enumeration: Probabilistic Explainability For Sample Efficient   Self-conditioned Molecular Design

Jeff Guo; Philippe Schwaller

arXiv:2309.13957·q-bio.BM·March 5, 2024

Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Jeff Guo, Philippe Schwaller

PDF

Open Access 3 Repos

TL;DR

This paper introduces Beam Enumeration, a method for extracting meaningful substructures from language-based molecular generative models, enhancing explainability and sample efficiency, and demonstrating improved performance on molecular optimization benchmarks.

Contribution

The paper presents Beam Enumeration, a novel approach that improves explainability and sample efficiency in molecular generative models, and enhances existing algorithms like Augmented Memory.

Findings

01

Beam Enumeration effectively extracts molecular substructures.

02

Coupling with reinforcement learning improves sample efficiency.

03

The combined method achieves state-of-the-art results on benchmarks.

Abstract

Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very recent papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam Enumeration to exhaustively enumerate the most probable sub-sequences from language-based molecular generative models and show that molecular substructures can be extracted. When coupled with reinforcement learning, extracted substructures become meaningful, providing a source of explainability and improving sample efficiency through self-conditioned generation. Beam Enumeration is generally applicable to any language-based molecular generative model and notably further improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Protein Structure and Dynamics