La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching

Tomas Geffner; Kieran Didi; Zhonglin Cao; Danny Reidenbach; Zuobai Zhang; Christian Dallago; Emine Kucukbenli; Karsten Kreis; Arash Vahdat

arXiv:2507.09466·cs.LG·July 15, 2025·2 cites

La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching

Tomas Geffner, Kieran Didi, Zhonglin Cao, Danny Reidenbach, Zuobai Zhang, Christian Dallago, Emine Kucukbenli, Karsten Kreis, Arash Vahdat

PDF

Open Access 3 Models 3 Reviews

TL;DR

La-Proteina is a novel generative model that efficiently produces fully atomistic protein structures and sequences simultaneously, overcoming previous limitations in side-chain modeling and enabling scalable, high-quality protein design.

Contribution

It introduces a partially latent representation and flow matching approach for joint atomistic and sequence protein generation, achieving state-of-the-art results and scalability.

Findings

01

State-of-the-art performance on multiple benchmarks

02

Effective atomistic motif scaffolding capabilities

03

Scalable to proteins of up to 800 residues

Abstract

Recently, many generative models for de novo protein structure design have emerged. Yet, only few tackle the difficult task of directly generating fully atomistic structures jointly with the underlying amino acid sequence. This is challenging, for instance, because the model must reason over side chains that change in length during generation. We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. Flow matching in this partially latent space then models the joint distribution over sequences and full-atom structures. La-Proteina achieves state-of-the-art performance on multiple generation…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 10Confidence 4

Strengths

The motivation is reasonable, proteins' natural hierarchical structure (backbone + side chains) and variable-length side chains are suitable for partial latent generation. The method is presented clearly; training and inference hyperparameters are provided for reproducibility. The experimental part is well-designed, and they achieve SOTA results on unconditional and conditional full-atom protein generation benchmarks. They also provide many ablation and analysis experiments to support their

Weaknesses

**W1. The La-Proteina model has only been trained and tested on monomeric proteins. Can it be trained or fine-tuned on multi-chain data (protein complex design), such as APM[1]?** **W2. Folding can be regarded as a kind of conditional full-atom generation. Can La-Proteina compare on the folding benchmark?** [1].An All-Atom Generative Model for Designing Protein Complexes

Reviewer 02Rating 8Confidence 4

Strengths

1. Strong empirical performance across the board. 2. Hybrid representation with the partially latent dimension is thoughtful. It allocates enough resolution to the key backbone portion while circumventing the issue of needing to pre-specify the number of atoms. 3. Biophysical evaluations are compelling; examining for clash rates and rotamer frequencies sets a new evaluation metric that is meaningful for this subfield. 4. The technical ability to decouple the timesteps necessary is thoughtfully d

Weaknesses

* A short coming of having two noise schedules is that we now need to tune two noise schedules – authors were able to get good results regardless, but this is a small concern for practical usage. * Not so much specific to this paper, but generally conditional generation becomes much more interesting than pure unconditional generation. Would strengthen the paper if more of the work was focused on that aspect.

Reviewer 03Rating 8Confidence 3

Strengths

- The proposed representation elegantly addresses a core difficulty in protein generation: the mixed discrete (sequence) and continuous (structure) nature with variable side-chain sizes. By encoding side-chain atoms and amino acid type into a fixed continuous latent per residue, La-Proteina avoids having to explicitly model discrete sequence choices and variable atom counts during generation. This is a novel solution not seen in prior protein diffusion models, which either treated all atoms expl

Weaknesses

- A notable limitation is that La-Proteina is only demonstrated on single-chain proteins, whereas many real design tasks involve multi-chain complexes or protein–protein interactions. The authors explicitly acknowledge this as future work. While focusing on monomers is reasonable for a first step (and they already handle up to 800 residues in one chain), it means the method currently cannot design binding interfaces or multi-subunit assemblies. Competing approaches have introduced multi-chain ge

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Microfluidic and Catalytic Techniques Innovation