Reversible Upper Confidence Bound Algorithm to Generate Diverse   Optimized Candidates

Bin Chong; Yingguang Yang; Zi-Le Wang; Hang Xing; and Zhirong Liu

arXiv:2112.14893·cs.LG·January 3, 2022

Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates

Bin Chong, Yingguang Yang, Zi-Le Wang, Hang Xing, and Zhirong Liu

PDF

TL;DR

This paper introduces a reversible upper confidence bound (rUCB) algorithm designed to efficiently generate diverse high-reward candidates, demonstrated in virtual screening for drug discovery, reducing query times while maintaining accuracy.

Contribution

The paper presents a novel rUCB algorithm tailored for diverse candidate generation with high rewards, extending reinforcement learning applications beyond traditional reward maximization.

Findings

01

rUCB reduces query times significantly

02

Achieves high accuracy with low performance loss

03

Potential applications in multipoint optimization

Abstract

Most algorithms for the multi-armed bandit problem in reinforcement learning aimed to maximize the expected reward, which are thus useful in searching the optimized candidate with the highest reward (function value) for diverse applications (e.g., AlphaGo). However, in some typical application scenaios such as drug discovery, the aim is to search a diverse set of candidates with high reward. Here we propose a reversible upper confidence bound (rUCB) algorithm for such a purpose, and demonstrate its application in virtual screening upon intrinsically disordered proteins (IDPs). It is shown that rUCB greatly reduces the query times while achieving both high accuracy and low performance loss.The rUCB may have potential application in multipoint optimization and other reinforcement-learning cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.