Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations

QiWei Meng

arXiv:2601.00647·cs.CL·January 5, 2026

Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations

QiWei Meng

PDF

Open Access

TL;DR

Physio-DPO is a physics-informed alignment method for protein language models that reduces structural hallucinations by incorporating thermodynamic stability, leading to more accurate and foldable protein structures.

Contribution

It introduces a magnitude aware objective that aligns protein models with the energy landscape, improving stability and reducing hallucinations compared to existing methods.

Findings

01

Reduces RMSD to 1.28 Å

02

Achieves 92.8% foldability

03

Mitigates structural hallucinations effectively

Abstract

Large Protein Language Models have shown strong potential for generative protein design, yet they frequently produce structural hallucinations, generating sequences with high linguistic likelihood that fold into thermodynamically unstable conformations. Existing alignment approaches such as Direct Preference Optimization are limited in this setting, as they model preferences as binary labels and ignore the continuous structure of the physical energy landscape. We propose Physio-DPO, a physics informed alignment framework that grounds protein language models in thermodynamic stability. Physio-DPO introduces a magnitude aware objective that scales optimization updates according to the energy gap between native structures and physics perturbed hard negatives. Experiments show that Physio-DPO consistently outperforms strong baselines including SFT, PPO, and standard DPO, reducing self…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProtein Structure and Dynamics · Machine Learning in Materials Science · RNA and protein synthesis mechanisms