Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

Fang Wu; Weihao Xuan; Heli Qi; Hanqun Cao; Heng-Jui Chang; Zeqi Zhou; Haokai Zhao; Ma Jian; Carl Ma; Yu-Chi Cheng; Kuan Pang; Xiangru Tang; Zehong Wang; Guanlue Li; Hanchen Wang; Kejun Ying; Pan Lu; Chiho Im; Seungju Han; Peng Xia; Tinson Xu; Yinxi Li; Deyao Zhu; Pheng-Ann Heng; Naoto Yokoya; Masashi Sugiyama; Li Erran Li; Jure Leskovec; Yejin Choi

arXiv:2605.02937·cs.LG·May 6, 2026

Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

Fang Wu, Weihao Xuan, Heli Qi, Hanqun Cao, Heng-Jui Chang, Zeqi Zhou, Haokai Zhao, Ma Jian, Carl Ma, Yu-Chi Cheng, Kuan Pang, Xiangru Tang, Zehong Wang, Guanlue Li, Hanchen Wang, Kejun Ying, Pan Lu, Chiho Im, Seungju Han, Peng Xia, Tinson Xu, Yinxi Li, Deyao Zhu, Pheng-Ann Heng

PDF

1 Repo

TL;DR

Proteo-R1 introduces a dual-expert framework combining reasoning and generative models for more interpretable and controllable de novo protein design, explicitly reasoning about key residues before geometric synthesis.

Contribution

It presents a novel dual-expert architecture that separates understanding from generation, enabling explicit residue-level reasoning in protein design.

Findings

01

Achieves stable and interpretable protein design by decoupling reasoning from geometric generation.

02

Utilizes a multimodal large language model for residue analysis and a diffusion model for constrained geometry synthesis.

03

Code, data, and demos are publicly available at the provided URL.

Abstract

Deep learning in \emph{de novo} protein design has achieved atomic-level fidelity. However, existing models remain largely non-deliberative: they directly synthesize molecular geometries without explicitly reasoning about which residues or interactions are functionally essential. As a result, design decisions are entangled with continuous sampling dynamics, limiting interpretability, controllability, and systematic reuse of biochemical knowledge. We introduce \textbf{Proteo-R1}, a reasoning-guided protein design framework that explicitly decouples \emph{molecular understanding} from \emph{geometric generation}. Proteo-R1 adopts a dual-expert architecture in which a multimodal large language model (MLLM) serves as an \emph{understanding expert}, analyzing protein sequences, structures, and textual context to identify key functional residues that govern binding and specificity. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://smiles724.github.io/r1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.