Learning Collective Variables from BioEmu with Time-Lagged Generation
Seonghyun Park, Kiyoung Seong, Soojung Yang, Rafael G\'omez-Bombarelli, Sungsoo Ahn

TL;DR
This paper introduces BioEmu-CV, a framework that automatically learns collective variables from BioEmu to improve enhanced sampling in molecular dynamics, focusing on capturing slow dynamics for protein folding studies.
Contribution
It proposes a novel method to learn time-lagged collective variables from BioEmu, enabling better sampling of slow molecular processes without manual CV selection.
Findings
Effective CVs for fast-folding proteins were learned.
Improved free energy estimation with on-the-fly sampling.
Enhanced transition path sampling in molecular dynamics.
Abstract
Molecular dynamics is crucial for understanding molecular systems but its applicability is often limited by the vast timescales of rare events like protein folding. Enhanced sampling techniques overcome this by accelerating the simulation along key reaction pathways, which are defined by collective variables (CVs). However, identifying effective CVs that capture the slow, macroscopic dynamics of a system remains a major bottleneck. This work proposes a novel framework coined BioEmu-CV that learns these essential CVs automatically from BioEmu, a recently proposed foundation model for generating protein equilibrium samples. In particular, we re-purpose BioEmu to learn time-lagged generation conditioned on the learned CV, i.e., predict the distribution of molecular states after a certain amount of time. This training process promotes the CV to encode only the slow, long-term information…
Peer Reviews
Decision·ICLR 2026 Poster
- The integration is simple and practical: keep BioEmu frozen and train only a small conditioning head. This is an attractive engineering path if it works broadly. - On SMD tasks, the method often reports better target-hitting probability and lower maximum energy along the path, which suggests the CV is useful for guiding transitions. - The paper provides reasonably clear setup details for OPES and SMD, which helps with reproducibility.
- The paper needs more ablation. It is unclear which components matter most. For example, comparisons for time-lagged vs. non-time-lagged training, frozen BioEmu vs. partial fine-tuning, different encoder sizes and placements, and SMD runs without mixing the learned CV with RMSD. Without these tests, it is hard to attribute the reported gains to the proposed choices, and hard to evaluate the novelty of the proposed method vs. the reported baseline methods. - Restricting the CV to one dimension a
1. The idea of capturing collective variables through a conditional generative task is novel. 2. The paper is well-structured, clearly presented, and supported by thorough experiments.
1. The method heavily relies on the model capabilities of BioEmu. The captured CV may contain model bias. 2. The paper a bit overstates the scope of its CVs, especially in the title. It should make clear that the method focuses on CVs for enhanced sampling, rather than general CV learning. 3. The dataset is quite limited, and the selected proteins are all very small. The method should be tested on a broader set of proteins with varying sizes to better demonstrate its generality and robustness.
* originality Learning latent representations from a fixed pretrained model is not new but this paper adapts it for learning CVs for slow dynamics which is a novel application. * quality The model's predictions are compared against state-of-the-arts quantitatively while it would have been better if more that 3 proteins were tested. * clarity It is mostly clear to understand except for some typos and sentences. * significance Running MD simulations to investigate slow dynamics of prote
* To see if the model works well on a variety of proteins, it would make the paper stronger if more than 3 proteins were tested.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Materials Science · Nanopore and Nanochannel Transport Studies
MethodsTest-time Local Converter · Focus
