Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention
Jeffrey D. Varner

TL;DR
This paper introduces a training-free stochastic attention method for generating protein sequences from small family alignments, producing structurally plausible and diverse sequences efficiently without the need for training or extensive data.
Contribution
The authors present stochastic attention, a novel training-free sampling technique using Hopfield energy as a Boltzmann distribution, enabling efficient protein sequence generation from small alignments.
Findings
Generated sequences show low amino acid divergence and high structural plausibility.
Sequences fold more accurately to known structures than natural members in most tested families.
SA maintains high sequence identity while producing novel sequences quickly on a standard laptop.
Abstract
Most protein families have fewer than 100 known members, a regime where deep generative models overfit or collapse. We propose stochastic attention (SA), a training-free sampler that treats the modern Hopfield energy over a protein alignment as a Boltzmann distribution and draws samples via Langevin dynamics. The score function is a closed-form softmax attention operation requiring no training, no pretraining data, and no GPU, with cost linear in alignment size. Across eight Pfam families, SA generates sequences with low amino acid compositional divergence, substantial novelty, and structural plausibility confirmed by ESMFold and AlphaFold2. Generated sequences fold more faithfully to canonical family structures than natural members in six of eight families. Against profile HMMs, EvoDiff, and the MSA Transformer, which produce sequences that drift far outside the family, SA maintains 51…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Materials Science · Machine Learning in Bioinformatics
