Biological Sequence Design with GFlowNets
Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid, Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu, Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio

TL;DR
This paper introduces a novel active learning method using GFlowNets to generate diverse, high-quality biological sequences efficiently, improving the diversity and utility of candidate solutions in biological sequence design tasks.
Contribution
It proposes an active learning algorithm with epistemic uncertainty and GFlowNets for diverse candidate generation, incorporating existing datasets to enhance learning efficiency.
Findings
Generates more diverse candidate batches than existing methods.
Produces higher scoring and more novel biological sequences.
Improves efficiency in biological sequence design tasks.
Abstract
Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key consideration in the ideation phase. In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. We also propose a scheme to incorporate existing labeled datasets of candidates,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Computational Drug Discovery Methods · RNA and protein synthesis mechanisms
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
