Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics

Jiahao Wang; Shuangjia Zheng

arXiv:2601.11012·cs.AI·January 19, 2026

Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics

Jiahao Wang, Shuangjia Zheng

PDF

Open Access 1 Video 4 Reviews

TL;DR

HADES is a novel structure-aware Bayesian optimization method using Hamiltonian dynamics to efficiently design protein sequences with desired properties by integrating structural constraints and uncertainty modeling.

Contribution

The paper introduces HADES, a new optimization framework that combines Hamiltonian dynamics and structure-aware modeling for protein engineering, outperforming existing methods.

Findings

01

Outperforms state-of-the-art baselines in in-silico evaluations

02

Leverages structure-sequence mutual constraints for better design

03

Efficiently samples promising protein variants using Hamiltonian dynamics

Abstract

The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional complexities due to the epistasis effect and the disregard for structural constraints. To address this, we propose HADES, a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior. Leveraging momentum and uncertainty in the simulated physical movements, HADES enables rapid transition of proposals toward promising areas. A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system. The posterior surrogate is powered by a two-stage encoder-decoder framework to determine the structure and function relationships between mutant neighbors, consequently learning a smoothed…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 5

Strengths

The method achieves enhanced performance with fewer sampling steps when tested on two proteins.

Weaknesses

1. The benchmarking is not comprehensive. Experiments are limited to only two proteins (GB1 and PhoQ). More extensive testing is needed to demonstrate the efficacy of the proposed method. 2. The Method section is poorly organized and lacks key details. For example: a. There are no details on Bayesian optimization, although it's mentioned in Figure 1 and the Introduction. b. The details on the sequence encoder are not described. Do the input features include relative positional encoding?

Reviewer 02Rating 5Confidence 4

Strengths

*Originality*: While HMC is very well-studied in general, and a variety of sequence encoder/decoders have been used as proxies for Bayesian optimization in protein sequences, this combination is (to my knowledge) new. The intuition to using a structure decoder is physically sound and worth exploring (notwithstanding some limitation described below). *Quality* & *Clarity*: The presentation of the paper is very clear and easy to follow. The analysis is reasonably thorough and well-motivated. The

Weaknesses

1. One of the main issues with this paper is the lack of recent baselines and a limited range of tasks with varying difficulties. While HADES appears to be more efficient, it only _marginally_ outperforms the existing baseline methods. Table 3, for example, shows that much of the performance gain (e.g., compared to PEX) can be attributed to an improved surrogate model rather than HMC itself. Moreover, recent literature has demonstrated significant improvements over these baselines. For instance,

Reviewer 03Rating 5Confidence 4

Strengths

The paper is clearly written. The encoder-decoder architecture aims to distill structure relatedness into the resulting surrogate fitness scores albeit only through shared latent embedding. Nevertheless, bringing some (latent) structural information into sequence optimization seems like a good idea.

Weaknesses

Lots of space is used to discuss Hamiltonian dynamics though this is not strictly speaking followed. HMC(q,f) randomizes the momentum for each call, performs L updates of all residues, starting with the current seed q, accepting each update & its associated discretization with MH. The random momentum moves the system in random direction though remains guided by the potential energy that is defined as -log(P(f(q))). One would think that it would be advantageous to move in the continuous space (re

Reviewer 04Rating 3Confidence 5

Strengths

* Using Hamiltonian dynamics is a novel contribution. * HADES outperforms all chosen baselines on GB1 and PhoQ.

Weaknesses

* The chosen datasets, GB1 and PhoQ, are toyish since they only require mutating up to 4 residues. This is a small search spaces compared to other protein engineering benchmarks such as AAV and GFP [1] that are commonly used in many works. Even the referenced work [2] evaluates on GFP but this dataset is not used. I understand GB1/PhoQ are desirable since they don't require training oracles but there should still be evaluation of realistic protein engineering tasks such as AAV and GFP on top of

Videos

Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics· underline

Taxonomy

TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Advanced Multi-Objective Optimization Algorithms