GROOT: Effective Design of Biological Sequences with Limited Experimental Data
Thanh V. T. Tran, Nhat Khang Ngo, Viet Anh Nguyen, Truong Son Hy

TL;DR
GROOT introduces a graph-based smoothing method for biological sequence design that effectively leverages limited data to improve optimization, outperforming existing approaches without extensive labeled datasets.
Contribution
GROOT presents a novel graph-based pseudo-labeling and smoothing technique for biological sequence optimization, especially effective with limited labeled data.
Findings
GROOT outperforms existing methods on protein and biological sequence tasks.
It can extrapolate beyond training data while maintaining reliability.
No need for extensive labeled data or black-box access.
Abstract
Latent space optimization (LSO) is a powerful method for designing discrete, high-dimensional biological sequences that maximize expensive black-box functions, such as wet lab experiments. This is accomplished by learning a latent space from available data and using a surrogate model to guide optimization algorithms toward optimal outputs. However, existing methods struggle when labeled data is limited, as training the surrogate model with few labeled data points can lead to subpar outputs, offering no advantage over the training data itself. We address this challenge by introducing GROOT, a Graph-based Latent Smoothing for Biological Sequence Optimization. In particular, GROOT generates pseudo-labels for neighbors sampled around the training latent embeddings. These pseudo-labels are then refined and smoothed by Label Propagation. Additionally, we theoretically and empirically justify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Image Processing Techniques and Applications · Bacteriophages and microbial interactions
MethodsSparse Evolutionary Training
