Variational Search Distributions
Daniel M. Steinberg, Rafael Oliveira, Cheng Soon Ong, Edwin V. Bonilla

TL;DR
VSD is a variational inference-based method for efficiently conditioning generative models on rare classes in discrete design spaces, applicable to biological sequence design and other combinatorial tasks.
Contribution
The paper introduces VSD, a novel active generation approach that leverages variational inference and scalable models for conditioned generative design.
Findings
VSD outperforms baseline methods on biological sequence design tasks.
The method demonstrates scalable and efficient learning of conditional distributions.
Asymptotic convergence rates are derived for the proposed variational approach.
Abstract
We develop VSD, a method for conditioning a generative model of discrete, combinatorial designs on a rare desired class by efficiently evaluating a black-box (e.g. experiment, simulation) in a batch sequential manner. We call this task active generation; we formalize active generation's requirements and desiderata, and formulate a solution via variational inference. VSD uses off-the-shelf gradient based optimization routines, can learn powerful generative models for desirable designs, and can take advantage of scalable predictive models. We derive asymptotic convergence rates for learning the true conditional generative distribution of designs with certain configurations of our method. After illustrating the generative model on images, we empirically demonstrate that VSD can outperform existing baseline methods on a set of real sequence-design problems in various protein and DNA/RNA…
Peer Reviews
Decision·ICLR 2025 Poster
This paper demonstrates a clarity of thought and composition that is commendable, I particularly enjoyed the related work section. Likewise I do not have any major concerns regarding the technical soundness of the results presented. As a good conceptual introduction to the topic, I think this draft could be useful to researchers new to the topic with some revisions.
I have two general impressions of this paper. First, it seems like the authors have not really chosen a direction for the paper. There are at least three different directions here, A) a unifying view of sequential black box optimization algorithms, B) a practical algorithm for sequential BBO, and C) theoretical analysis of convergence rates of a particular sequential BBO algorithm under strong assumptions. I would suggest you pick no more than two directions, preferably one. I actually think t
* The problem is important as it has applications in pharmaceutical drugs/enzyme design. * The paper paper is well written and the method is sound * Experimental results on high dimensional datasets demonstrate superiority of the approach
* The method lacks novelty, it's based on putting together blocks that have already been proposed in the litterature * The paper clarity can be improved with an overview plot of the method
- The paper formulates the batch active search problem in the variational inference framework and provides theoretical guarantees to the learned distribution based on the sequentially attained data. - Experimental results on real-world biological datasets demonstrate the practical use of the algorithm and its effectiveness to solve the problem.
- The precision of VSD and most other methods is decreasing with more rounds in TrpB and TFBIND8 datasets while the recall values are in general low. However, an ideal method should achieve a better estimation of the ground truth super level-set distribution as more samples are collected. This may be due to the initial training set size being too large or the fitness landscape being easy to model. How do the models perform with a smaller initial training set size? - How is VSD compared with the
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
MethodsSparse Evolutionary Training · Variational Inference
