Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation

Pranav Kantroo; G\"unter P. Wagner; Benjamin B. Machta

arXiv:2407.07265·q-bio.BM·July 11, 2024

Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation

Pranav Kantroo, G\"unter P. Wagner, Benjamin B. Machta

PDF

Open Access

TL;DR

This paper introduces a fast, single-pass method to estimate protein sequence fitness using language model embeddings, achieving near state-of-the-art accuracy and enabling efficient exploration of functional sequences.

Contribution

The authors propose the One Fell Swoop approach for pseudo-perplexity estimation, significantly improving computational efficiency and performance in protein fitness prediction tasks.

Findings

01

OFS pseudo-perplexity performs nearly as well as true pseudo-perplexity.

02

Achieves new state-of-the-art on ProteinGym Indels benchmark.

03

Effectively detects increased stability in ancestral protein sequences.

Abstract

Protein language models trained on the masked language modeling objective learn to predict the identity of hidden amino acid residues within a sequence using the remaining observable sequence as context. They do so by embedding the residues into a high dimensional space that encapsulates the relevant contextual cues. These embedding vectors serve as an informative context-sensitive representation that not only aids with the defined training objective, but can also be used for other tasks by downstream models. We propose a scheme to use the embeddings of an unmasked sequence to estimate the corresponding masked probability vectors for all the positions in a single forward pass through the language model. This One Fell Swoop (OFS) approach allows us to efficiently estimate the pseudo-perplexity of the sequence, a measure of the model's uncertainty in its predictions, that can also serve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications