Forecasting labels under distribution-shift for machine-guided sequence design
Lauren Berk Wheelock, Stephen Malina, Jeffrey Gerold, Sam Sinai

TL;DR
This paper introduces a forecasting method to predict the performance distribution of biological sequence libraries, improving decision-making in machine-guided sequence design under distribution shifts.
Contribution
The authors propose a novel forecasting approach for sequence library performance, addressing the challenge of label distribution shift in machine-guided design.
Findings
Outperforms baseline methods in predicting library performance
Provides a posterior distribution of labels for high-throughput libraries
Enhances decision-making in sequence design processes
Abstract
The ability to design and optimize biological sequences with specific functionalities would unlock enormous value in technology and healthcare. In recent years, machine learning-guided sequence design has progressed this goal significantly, though validating designed sequences in the lab or clinic takes many months and substantial labor. It is therefore valuable to assess the likelihood that a designed set contains sequences of the desired quality (which often lies outside the label distribution in our training data) before committing resources to an experiment. Forecasting, a prominent concept in many domains where feedback can be delayed (e.g. elections), has not been used or studied in the context of sequence design. Here we propose a method to guide decision-making that forecasts the performance of high-throughput libraries (e.g. containing unique variants) based on estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Cell Image Analysis Techniques · Machine Learning in Materials Science
MethodsLib
