La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
Tomas Geffner, Kieran Didi, Zhonglin Cao, Danny Reidenbach, Zuobai Zhang, Christian Dallago, Emine Kucukbenli, Karsten Kreis, Arash Vahdat

TL;DR
La-Proteina is a novel generative model that efficiently produces fully atomistic protein structures and sequences simultaneously, overcoming previous limitations in side-chain modeling and enabling scalable, high-quality protein design.
Contribution
It introduces a partially latent representation and flow matching approach for joint atomistic and sequence protein generation, achieving state-of-the-art results and scalability.
Findings
State-of-the-art performance on multiple benchmarks
Effective atomistic motif scaffolding capabilities
Scalable to proteins of up to 800 residues
Abstract
Recently, many generative models for de novo protein structure design have emerged. Yet, only few tackle the difficult task of directly generating fully atomistic structures jointly with the underlying amino acid sequence. This is challenging, for instance, because the model must reason over side chains that change in length during generation. We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. Flow matching in this partially latent space then models the joint distribution over sequences and full-atom structures. La-Proteina achieves state-of-the-art performance on multiple generation…
Peer Reviews
Decision·ICLR 2026 Poster
The motivation is reasonable, proteins' natural hierarchical structure (backbone + side chains) and variable-length side chains are suitable for partial latent generation. The method is presented clearly; training and inference hyperparameters are provided for reproducibility. The experimental part is well-designed, and they achieve SOTA results on unconditional and conditional full-atom protein generation benchmarks. They also provide many ablation and analysis experiments to support their
**W1. The La-Proteina model has only been trained and tested on monomeric proteins. Can it be trained or fine-tuned on multi-chain data (protein complex design), such as APM[1]?** **W2. Folding can be regarded as a kind of conditional full-atom generation. Can La-Proteina compare on the folding benchmark?** [1].An All-Atom Generative Model for Designing Protein Complexes
1. Strong empirical performance across the board. 2. Hybrid representation with the partially latent dimension is thoughtful. It allocates enough resolution to the key backbone portion while circumventing the issue of needing to pre-specify the number of atoms. 3. Biophysical evaluations are compelling; examining for clash rates and rotamer frequencies sets a new evaluation metric that is meaningful for this subfield. 4. The technical ability to decouple the timesteps necessary is thoughtfully d
* A short coming of having two noise schedules is that we now need to tune two noise schedules – authors were able to get good results regardless, but this is a small concern for practical usage. * Not so much specific to this paper, but generally conditional generation becomes much more interesting than pure unconditional generation. Would strengthen the paper if more of the work was focused on that aspect.
- The proposed representation elegantly addresses a core difficulty in protein generation: the mixed discrete (sequence) and continuous (structure) nature with variable side-chain sizes. By encoding side-chain atoms and amino acid type into a fixed continuous latent per residue, La-Proteina avoids having to explicitly model discrete sequence choices and variable atom counts during generation. This is a novel solution not seen in prior protein diffusion models, which either treated all atoms expl
- A notable limitation is that La-Proteina is only demonstrated on single-chain proteins, whereas many real design tasks involve multi-chain complexes or protein–protein interactions. The authors explicitly acknowledge this as future work. While focusing on monomers is reasonable for a first step (and they already handle up to 800 residues in one chain), it means the method currently cannot design binding interfaces or multi-subunit assemblies. Competing approaches have introduced multi-chain ge
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Microfluidic and Catalytic Techniques Innovation
