Human-in-the-loop: Real-time Preference Optimization
Wenbin Wang, Wenjie Xu, Colin N. Jones

TL;DR
This paper introduces an online feedback optimization controller that uses pairwise comparison feedback to optimize user utility in real-time, ensuring stability and convergence in dynamic systems.
Contribution
It presents a novel online controller that incorporates random exploration and guarantees stability and convergence for preference-based optimization.
Findings
Controller converges to the optimal point under mild assumptions.
The approach guarantees closed-loop stability.
Numerical experiments validate the theoretical analysis.
Abstract
Optimization with preference feedback is an active research area with many applications in engineering systems where humans play a central role, such as building control and autonomous vehicles. While most existing studies focus on optimizing a static user utility, few have investigated its closed-loop behavior that accounts for system transients. In this work, we propose an online feedback optimization controller that optimizes user utility using pairwise comparison feedback with both optimality and closed-loop stability guarantees. By adding a random exploration signal, the controller estimates the descent direction based on the binary comparison feedback between two consecutive time steps. We analyze its closed-loop behavior when interacting with a nonlinear plant and show that, under mild assumptions, the controller converges to the optimal point without inducing instability.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
