ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods
Michal Kmicikiewicz, Vincent Fortuin, Ewa Szczurek

TL;DR
ProSpero is an active learning framework that guides a pre-trained generative model using surrogate feedback to efficiently explore and design novel, high-fitness protein sequences beyond wild-type neighborhoods, maintaining biological plausibility.
Contribution
It introduces ProSpero, a novel active learning approach combining surrogate models and biologically-constrained sampling for robust protein design beyond wild-type regions.
Findings
ProSpero outperforms existing methods in diverse protein engineering tasks.
It effectively explores beyond wild-type neighborhoods while maintaining biological plausibility.
The framework remains effective even with surrogate model misspecification.
Abstract
Designing protein sequences of both high fitness and novelty is a challenging task in data-efficient protein engineering. Exploration beyond wild-type neighborhoods often leads to biologically implausible sequences or relies on surrogate models that lose fidelity in novel regions. Here, we propose ProSpero, an active learning framework in which a frozen pre-trained generative model is guided by a surrogate updated from oracle feedback. By integrating fitness-relevant residue selection with biologically-constrained Sequential Monte Carlo sampling, our approach enables exploration beyond wild-type neighborhoods while preserving biological plausibility. We show that our framework remains effective even when the surrogate is misspecified. ProSpero consistently outperforms or matches existing methods across diverse protein engineering tasks, retrieving sequences of both high fitness and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topicsvaccines and immunoinformatics approaches · RNA and protein synthesis mechanisms · Gene Regulatory Network Analysis
