Improving Protein Sequence Design through Designability Preference Optimization
Fanglei Xue, Andrew Kubaney, Zhichun Guo, Joseph K. Min, Ge Liu, Yi Yang, David Baker

TL;DR
This paper introduces a novel optimization approach for protein sequence design that enhances the likelihood of sequences folding into desired structures by steering generation toward high designability using AlphaFold scores.
Contribution
It proposes Direct Preference Optimization with AlphaFold scores and Residue-level Designability Preference Optimization, significantly improving in silico protein design success rates.
Findings
Nearly 3-fold increase in design success rate (6.56% to 17.57%)
Effective residue-level refinement of sequence design
Improved in silico success on enzyme benchmark
Abstract
Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designability. To do this, we integrate Direct Preference Optimization (DPO), using AlphaFold pLDDT scores as the preference signal, which significantly improves the in silico design success rate. To further refine sequence generation at a finer, residue-level granularity, we introduce Residue-level Designability Preference Optimization (ResiDPO), which applies residue-level structural rewards and decouples optimization across residues. This enables direct improvement in designability while preserving…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The work is well-motivated. The work targets a well-identified objective mismatch in protein sequence design—optimizing for sequence recovery rather than designability. 2. The presentation is clear and easy to follow.
1. The methodological novelty appears limited. The work reads primarily as a straightforward application of DPO to protein sequence design. 2. Results should be broken down by secondary-structure class (all-α, all-β, α/β, α+β). All-α targets are typically easier to design than all-β, so aggregate reporting can mask meaningful differences. Please report the fold-class composition of the evaluation sets and stratified metrics to ensure fair comparisons. 3. The paper relies heavily on pLDDT Accur
1. The misalignment between sequence-recovery training and true designability is well motivated and clearly articulated. 2. Introducing residue-level decomposition (RPL + RCL) is a technically neat way to balance preference learning and knowledge retention, avoiding catastrophic forgetting. 3. Experiments on enzyme and binder benchmarks show consistent in silico improvements in design success rate and reasonable ablations.
1. ResiDPO is explicitly trained to generate sequences that yield higher AlphaFold2 pLDDT scores and is then evaluated using the same metric, creating a self-consistency bias that may inflate the reported gains in design success. Moreover, pLDDT measures local confidence within AlphaFold2 rather than true physical or thermodynamic stability. Using it as the sole optimization constraint is therefore too weak to capture real designability and may encourage the model to exploit AlphaFold2’s scoring
1. The residue-level optimization idea is well-motivated and fits naturally with protein structure design, where local regions can be evaluated independently and conserved regions should remain stable. 2. The method is conceptually sound and mathematically well-formulated, with clear derivations and intuitive design choices.
1. The evaluation is limited and entirely in silico. All preference signals and success metrics rely on AlphaFold2 (AF2), raising concerns that the model may simply exploit AF2’s scoring patterns rather than genuinely improving folding or stability. No cross-validation with other structure predictors (e.g., ESMFold, RoseTTAFold) or experimental validation is provided. 2. The set of baselines is rather limited, lacking comparisons with other sequence design methods such as ESM-IF, PiFold, or KW-D
The problem is well-motivated - sequence recovery doesn't equal designability, and framing this as an alignment problem makes sense. The threefold improvement on enzyme benchmarks looks good (Fig 2a). The method also generalizes decently to binder design. Decoupling the DPO loss at residue level is intuitive and seems to help based on the ablations. The dataset with residue-level pLDDT labels could be useful.
The core contribution is pretty incremental. It's basically DPO with residue-level splitting instead of sequence-level. Section 3.3 makes it sound complicated but the idea is simple if I got it right: apply preference loss where pLDDT is low, apply KL where it's high. This isn't a major conceptual advance, more of a good engineering. The paper oversells it as "novel alignment algorithm." Circular evaluation. You train using AF2 pLDDT and evaluate using AF2 predictions. How do we know this actua
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics
MethodsAlphaFold
