VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

Hyunjin Seo; Hongjoon Ahn; Jimin Park; Sungjun Han; Gyubok Lee; Soojung Yang; Joseph S Brown; Leo Chen; Gina El Nesr; Feyisayo Eweje; Sarah Gurev; Hyejin Lee; Cheng-Hao Liu; Junlang Liu; Zhihui Qi; Gyu Rie Lee; Sungsoo Ahn; Jamin Shin; Sangwon Jung

arXiv:2605.10978·q-bio.QM·May 19, 2026

VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

Hyunjin Seo, Hongjoon Ahn, Jimin Park, Sungjun Han, Gyubok Lee, Soojung Yang, Joseph S Brown, Leo Chen, Gina El Nesr, Feyisayo Eweje, Sarah Gurev, Hyejin Lee, Cheng-Hao Liu, Junlang Liu, Zhihui Qi, Gyu Rie Lee, Sungsoo Ahn, Jamin Shin, Sangwon Jung

PDF

TL;DR

VibeProteinBench is a comprehensive benchmark designed to evaluate large language models' ability to perform various aspects of vibe protein design, including recognition, engineering, and generation, under open-ended conditions.

Contribution

It introduces an integrated, multi-stage benchmark for assessing the broad capabilities of language models in vibe protein design, filling a gap in existing evaluation frameworks.

Findings

01

No current model excels across all three design stages.

02

The benchmark reveals significant challenges for generalist models in protein design.

03

In silico validation confirms biological plausibility of model outputs.

Abstract

Protein design aims to compose amino-acid sequences that fold into stable three-dimensional structures while satisfying targeted functional properties. The field is increasingly shifting toward vibe protein design, where a single model is expected to generate novel sequences, engineer existing proteins, and reason about protein characteristics through flexible natural-language constraints. Large language models (LLMs) have emerged as a leading paradigm in this space. However, existing evaluation benchmarks often limit their scope to a partial aspect of protein design, while others restrict design objectives to structured input schemas, lacking an integrated framework that evaluates the broad spectrum of protein design competence under open-ended intents. To this end, we present Vibe Protein design Benchmark (VibeProteinBench), a language-interfaced benchmark that probes generalist…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.