Oracle-Robust Online Alignment for Large Language Models

Zimeng Li; Mudit Gaur; Vaneet Aggarwal

arXiv:2602.20457·cs.LG·February 25, 2026

Oracle-Robust Online Alignment for Large Language Models

Zimeng Li, Mudit Gaur, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces a robust online alignment method for large language models that accounts for oracle uncertainty, providing a worst-case optimization framework with theoretical convergence guarantees.

Contribution

It formulates an oracle-robust online alignment objective with a closed-form decomposition and develops a stochastic update algorithm with proven complexity bounds.

Findings

01

Exact closed-form decomposition for the robust objective.

02

Projected stochastic updates for weakly convex functions.

03

Proven $ ilde{O}( ext{epsilon}^{-2})$ oracle complexity.

Abstract

We study online alignment of large language models under misspecified preference feedback, where the observed preference oracle deviates from an ideal but unknown ground-truth oracle. The online LLM alignment problem is a bi-level reinforcement problem due to the coupling between data collection and policy updates. Recently, the problem has been reduced to tractable single-level objective in the SAIL (Self-Improving Efficient Online Alignment) framework. In this paper, we introduce a pointwise oracle uncertainty set in this problem and formulate an oracle-robust online alignment objective as a worst-case optimization problem. For log-linear policies, we show that this robust objective admits an exact closed-form decomposition into the original loss function plus an explicit sensitivity penalty. We develop projected stochastic composite updates for the resulting weakly convex objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Game Theory and Voting Systems