Safe Online Convex Optimization with Multi-Point Feedback
Spencer Hutchinson, Mahnoosh Alizadeh

TL;DR
This paper introduces a safe online convex optimization algorithm that uses multi-point zero-order feedback to achieve sublinear regret and zero constraint violation, suitable for safety-critical applications.
Contribution
It proposes a novel algorithm combining forward-difference gradient estimation with optimistic and pessimistic action sets for safety and efficiency.
Findings
Achieves $ ilde{O}(d \, \sqrt{T})$ regret with zero constraint violation.
Effectively handles unknown constraints and zero-order feedback.
Demonstrates empirical performance through numerical studies.
Abstract
Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses points in each round (where is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Wireless Network Optimization · Advanced Bandit Algorithms Research · Smart Parking Systems Research
