Aligning Transformers with Continuous Feedback via Energy Rank Alignment

Shriram Chennakesavalu; Frank Hu; Sebastian Ibarraran; Grant M. Rotskoff

arXiv:2405.12961·cs.LG·October 24, 2025·2 cites

Aligning Transformers with Continuous Feedback via Energy Rank Alignment

Shriram Chennakesavalu, Frank Hu, Sebastian Ibarraran, Grant M. Rotskoff

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Energy Rank Alignment (ERA), a scalable, gradient-based method for optimizing autoregressive models to generate molecules and proteins with desired properties, effectively addressing the molecular search and alignment problems.

Contribution

ERA provides a novel, reward-driven, gradient-based optimization algorithm for molecular and protein generation, closely related to PPO and DPO, without requiring reinforcement learning.

Findings

01

ERA converges to an ideal Gibbs-Boltzmann distribution.

02

It performs well with limited preference data.

03

It robustly generates molecules and proteins with specified properties.

Abstract

Searching through chemical space is an exceptionally challenging problem because the number of possible molecules grows combinatorially with the number of atoms. Large, autoregressive models trained on databases of chemical compounds have yielded powerful generators, but we still lack robust strategies for generating molecules with desired properties. This molecular search problem closely resembles the "alignment" problem for large language models, though for many chemical tasks we have a specific and easily evaluable reward function. Here, we introduce an algorithm called energy rank alignment (ERA) that leverages an explicit reward function to produce a gradient-based objective that we use to optimize autoregressive policies. We show theoretically that this algorithm is closely related to proximal policy optimization (PPO) and direct preference optimization (DPO), but has a minimizer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Aligning Transformers with Continuous Feedback via Energy Rank Alignment· slideslive

Taxonomy

TopicsData Management and Algorithms · History and advancements in chemistry · Multi-Criteria Decision Making

MethodsDirect Preference Optimization · Focus · ALIGN