Alignment of large language models with constrained learning

Botong Zhang; Shuo Li; Ignacio Hounie; Osbert Bastani; Dongsheng Ding; Alejandro Ribeiro

arXiv:2505.19387·cs.LG·November 27, 2025

Alignment of large language models with constrained learning

Botong Zhang, Shuo Li, Ignacio Hounie, Osbert Bastani, Dongsheng Ding, Alejandro Ribeiro

PDF

Open Access 1 Video

TL;DR

This paper introduces a dual-based iterative method for aligning large language models with constraints, addressing convergence issues of previous methods and providing theoretical guarantees of near-optimality.

Contribution

The paper develops a novel dual-based alignment algorithm for LLMs that guarantees convergence to near-optimal constrained policies, with theoretical analysis and empirical validation.

Findings

01

The proposed method converges to near-optimal constrained policies.

02

Theoretical bounds on the primal-dual gap are established.

03

Experimental results demonstrate the effectiveness on RLHF datasets.

Abstract

We study the problem of computing an optimal large language model (LLM) policy for the constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based methods do not achieve optimality in the LLM parameter space. To address these challenges, we employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the LLM policy via Lagrangian maximization and updating the dual variable via dual descent. In theory, we characterize the primal-dual gap between the primal value in the distribution space and the dual value in the LLM parameter space. We further quantify the optimality gap of the learned LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Alignment of Large Language Models with Constrained Learning· slideslive

Taxonomy

TopicsNatural Language Processing Techniques