Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training
Felix Teufel, Aaron W. Kollasch, Yining Huang, Ole Winther, Kevin K. Yang, Pascal Notin, Debora S. Marks

TL;DR
PRIMO is a transformer-based framework that uses in-context learning and test-time training to predict protein fitness accurately with minimal data, outperforming existing methods across diverse protein types.
Contribution
The paper introduces PRIMO, a novel transformer-based approach that combines in-context learning and test-time training for rapid adaptation in protein fitness prediction.
Findings
PRIMO outperforms zero-shot and supervised baselines.
Effective across diverse protein families and mutation types.
Leverages large-scale pre-training with test-time adaptation.
Abstract
Accurately predicting protein fitness with minimal experimental data is a persistent challenge in protein engineering. We introduce PRIMO (PRotein In-context Mutation Oracle), a transformer-based framework that leverages in-context learning and test-time training to adapt rapidly to new proteins and assays without large task-specific datasets. By encoding sequence information, auxiliary zero-shot predictions, and sparse experimental labels from many assays as a unified token set in a pre-training masked-language modeling paradigm, PRIMO learns to prioritize promising variants through a preference-based loss function. Across diverse protein families and properties-including both substitution and indel mutations-PRIMO outperforms zero-shot and fully supervised baselines. This work underscores the power of combining large-scale pre-training with efficient test-time adaptation to tackle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Genomics and Rare Diseases
