Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates
Bin Chong, Yingguang Yang, Zi-Le Wang, Hang Xing, and Zhirong Liu

TL;DR
This paper introduces a reversible upper confidence bound (rUCB) algorithm designed to efficiently generate diverse high-reward candidates, demonstrated in virtual screening for drug discovery, reducing query times while maintaining accuracy.
Contribution
The paper presents a novel rUCB algorithm tailored for diverse candidate generation with high rewards, extending reinforcement learning applications beyond traditional reward maximization.
Findings
rUCB reduces query times significantly
Achieves high accuracy with low performance loss
Potential applications in multipoint optimization
Abstract
Most algorithms for the multi-armed bandit problem in reinforcement learning aimed to maximize the expected reward, which are thus useful in searching the optimized candidate with the highest reward (function value) for diverse applications (e.g., AlphaGo). However, in some typical application scenaios such as drug discovery, the aim is to search a diverse set of candidates with high reward. Here we propose a reversible upper confidence bound (rUCB) algorithm for such a purpose, and demonstrate its application in virtual screening upon intrinsically disordered proteins (IDPs). It is shown that rUCB greatly reduces the query times while achieving both high accuracy and low performance loss.The rUCB may have potential application in multipoint optimization and other reinforcement-learning cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
