# Multimodal diffusion for joint design of protein sequence and structure

**Authors:** Shaowen Zhu, Siddhant Gulati, Yuxuan Liu, Siddhi Kotnis, Qing Sun, Yang Shen

PMC · DOI: 10.1002/pro.70340 · 2025-11-14

## TL;DR

This paper presents a new method for designing proteins by jointly generating their sequence and structure using a multimodal diffusion framework.

## Contribution

The novel contribution is a unified generative model, JointDiff, that co-designs protein sequence and structure in a single process.

## Key findings

- JointDiff produces monomer structures with comparable or better designability than two-stage models.
- The method is 1–2 orders of magnitude faster and supports classifier-guided sampling for rapid improvements.
- Experimentally validated variants of green fluorescent protein show measurable fluorescence, confirming functionality.

## Abstract

Computational design of functional proteins is of both fundamental and applied interest. This study introduces a generative framework for co‐designing protein sequence and structure in a unified process by modeling their joint distribution, with the goal of enabling cross‐modality interactions toward coherent and functional designs. Each residue is represented by three distinct modalities (type, position, and orientation) and modeled using dedicated diffusion processes: multinomial for types, Cartesian for positions, and special orthogonal group SO(3) for orientations. To couple these modalities, we propose a unified architecture, ReverseNet, which employs a shared graph attention encoder to integrate multimodal information and separate projectors to predict each modality. We benchmark our models, JointDiff and JointDiff‐x, on unconditional monomer design and conditional motif scaffolding tasks. Compared to two‐stage design models that generate sequence and structure separately, our models produce monomer structures with comparable or better designability, while currently lagging in sequence quality and motif scaffolding performance based on computational metrics. However, they are 1–2 orders of magnitude faster and support rapid iterative improvements through classifier‐guided sampling. To complement computational evaluations, we experimentally validate our approach through a case study on green fluorescent protein (GFP) design. Several novel, evolutionarily distant variants generated by our models exhibit measurable fluorescence, confirming functional activity. These results demonstrate the feasibility of joint sequence–structure generation and establish a foundation to accelerate functional protein design in future applications. Codes, data, and trained models are accessible at https://github.com/Shen-Lab/JointDiff.

## Linked entities

- **Proteins:** NAL1 (Protein NARROW LEAF 1)

## Full-text entities

- **Chemicals:** DDPM (-), amino acid (MESH:D000596), N (MESH:D009584), kanamycin (MESH:D007612), O (MESH:D010100), SO(3) (MESH:C011118), acid (MESH:D000143), C (MESH:D002244)
- **Species:** Escherichia coli (E. coli, species) [taxon 562]
- **Mutations:** R96A
- **Cell lines:** ESM3 — Homo sapiens (Human), Transformed cell line (CVCL_XI05)

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12617271/full.md

---
Source: https://tomesphere.com/paper/PMC12617271