TL;DR
LineageFlow is a novel flow-matching model that generates biologically plausible protein sequences by leveraging ancestral lineage priors, improving family validity and structural confidence over traditional methods.
Contribution
It introduces a Dirichlet flow-matching approach for protein generation using lineage priors, enhancing plausibility and diversity in family-aware sequence synthesis.
Findings
Achieves family validity close to natural sequences.
Improves structural confidence predictions.
Enables objective-guided sampling with rerouting.
Abstract
Protein sequence generation for engineering requires samples that are biophysically plausible and, when targeting a family/domain, remain recognizable members while exploring within-family diversity. Current discrete generative models typically start from uniform or masked-token noise, which discards strong position-specific constraints induced by evolution and forces the model to reconstruct conserved residues from scratch, leading to weak family control and low plausibility. We propose \emph{LineageFlow}, a Dirichlet flow-matching model that initializes generation from lineage priors derived from ancestral sequence reconstruction, turning generation into structured mutation from an evolved scaffold. Across diverse protein families, LineageFlow achieves family validity close to held-out natural sequences and improves predicted structural confidence over uniform-/mask-initialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
