Human-in-the-loop: Real-time Preference Optimization

Wenbin Wang; Wenjie Xu; Colin N. Jones

arXiv:2506.02225·math.OC·March 31, 2026

Human-in-the-loop: Real-time Preference Optimization

Wenbin Wang, Wenjie Xu, Colin N. Jones

PDF

TL;DR

This paper introduces an online feedback optimization controller that uses pairwise comparison feedback to optimize user utility in real-time, ensuring stability and convergence in dynamic systems.

Contribution

It presents a novel online controller that incorporates random exploration and guarantees stability and convergence for preference-based optimization.

Findings

01

Controller converges to the optimal point under mild assumptions.

02

The approach guarantees closed-loop stability.

03

Numerical experiments validate the theoretical analysis.

Abstract

Optimization with preference feedback is an active research area with many applications in engineering systems where humans play a central role, such as building control and autonomous vehicles. While most existing studies focus on optimizing a static user utility, few have investigated its closed-loop behavior that accounts for system transients. In this work, we propose an online feedback optimization controller that optimizes user utility using pairwise comparison feedback with both optimality and closed-loop stability guarantees. By adding a random exploration signal, the controller estimates the descent direction based on the binary comparison feedback between two consecutive time steps. We analyze its closed-loop behavior when interacting with a nonlinear plant and show that, under mild assumptions, the controller converges to the optimal point without inducing instability.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.