Aligning Transformers with Continuous Feedback via Energy Rank Alignment
Shriram Chennakesavalu, Frank Hu, Sebastian Ibarraran, Grant M. Rotskoff

TL;DR
This paper introduces Energy Rank Alignment (ERA), a scalable, gradient-based method for optimizing autoregressive models to generate molecules and proteins with desired properties, effectively addressing the molecular search and alignment problems.
Contribution
ERA provides a novel, reward-driven, gradient-based optimization algorithm for molecular and protein generation, closely related to PPO and DPO, without requiring reinforcement learning.
Findings
ERA converges to an ideal Gibbs-Boltzmann distribution.
It performs well with limited preference data.
It robustly generates molecules and proteins with specified properties.
Abstract
Searching through chemical space is an exceptionally challenging problem because the number of possible molecules grows combinatorially with the number of atoms. Large, autoregressive models trained on databases of chemical compounds have yielded powerful generators, but we still lack robust strategies for generating molecules with desired properties. This molecular search problem closely resembles the "alignment" problem for large language models, though for many chemical tasks we have a specific and easily evaluable reward function. Here, we introduce an algorithm called energy rank alignment (ERA) that leverages an explicit reward function to produce a gradient-based objective that we use to optimize autoregressive policies. We show theoretically that this algorithm is closely related to proximal policy optimization (PPO) and direct preference optimization (DPO), but has a minimizer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Management and Algorithms · History and advancements in chemistry · Multi-Criteria Decision Making
MethodsDirect Preference Optimization · Focus · ALIGN
