Metalic: Meta-Learning In-Context with Protein Language Models
Jacob Beck, Shikha Surana, Manus McAuliffe, Oliver Bent, Thomas D. Barrett, Juan Jose Garau Luis, Paul Duckworth

TL;DR
Metalic introduces a meta-learning approach for protein property prediction that leverages in-context learning and fine-tuning, enabling effective transfer to new tasks with limited data and achieving state-of-the-art results with fewer parameters.
Contribution
This work presents a novel meta-learning framework for protein fitness prediction that improves generalization to unseen tasks and reduces model complexity.
Findings
Metaic outperforms existing models on ProteinGym benchmark.
Fine-tuning enables strong generalization despite limited data.
The approach achieves state-of-the-art results with 18 times fewer parameters.
Abstract
Predicting the biophysical and functional properties of proteins is essential for in silico protein design. Machine learning has emerged as a promising technique for such prediction tasks. However, the relative scarcity of in vitro annotations means that these models often have little, or no, specific data on the desired fitness prediction task. As a result of limited data, protein language models (PLMs) are typically trained on general protein sequence modeling tasks, and then fine-tuned, or applied zero-shot, to protein fitness prediction. When no task data is available, the models make strong assumptions about the correlation between the protein sequence likelihood and fitness scores. In contrast, we propose meta-learning over a distribution of standard fitness prediction tasks, and demonstrate positive transfer to unseen fitness prediction tasks. Our method, called Metalic…
Peer Reviews
Decision·ICLR 2025 Poster
Metalic achieves success in both providing a compute efficient alternative to meta-learning and also provides strong performance on downstream tasks. It’s also noteworthy the parameter efficiency that Metalic achieves in the process. The work is also highly reproducible. Each experiment could be reproduced by a reader from the text alone (barring which datasets were and were not used). In the appendix, readers can find a comprehensive set of hyperparameters and training details used in each exp
The only major weakness that appears in this work is the limited number of datasets used in evaluating the proposed methods. ProteinGym performance fluctuates a lot amongst tasks, hence it would be nice to decouple the comparatively small number of datasets, 13 in total, for evaluation from performance. A simple experiment selecting different splits and training replicates in a similar manner to a partial k-fold cross validation would be convincing. On a more minor note, the exact implementatio
* The idea is relatively under-explored in protein ML despite ICL having shown strong performance in other fields of ML, and the importance of few-shot learning. * Paper is fairly clearly written * Attention map analysis is a great addition to the work! Would love to see follow up on this.
* In Table 1, 2, and 3a, only ESM2-8M is used for comparison. What happens if we compare to the larger ESM2 models? Can we get better zero-shot or few-shot performance via scaling pretraining parameters? * Results in Figure 3b and 3a don't seem too much better than baselines. * More rigorous examination of the effects of varying query set size, etc. might be quite insightful * Some papers have pointed out that protein likelihoods don't always indicate protein fitness, and might be due to the tra
1-By integrating in-context meta-learning with subsequent fine-tuning, Metalic can generalize well to new tasks, even though fine-tuning is not accounted for during meta-training. Moreover, the model is more computationally efficient than state-of-the-art approaches. 2-Metalic achieves strong performance in zero-shot and few-shot protein fitness prediction tasks, which is particularly valuable given the limited availability of labeled data for many protein-related tasks.
1-I anticipate a more in-depth analysis of the correlation between PLM probability and fitness values in the context of meta-learning, especially when the authors assume that this correlation may not be reliable. 2-Although in-context learning is a strength, its effectiveness diminishes for tasks that deviate significantly from the distribution of tasks used during meta-training, potentially limiting generalization to out-of-distribution tasks. Experimental results on some OOD tasks can help re
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
