Risk-averse learning with delayed feedback
Siyi Wang, Zifan Wang, Karl Henrik Johansson, Sandra Hirche

TL;DR
This paper develops risk-averse learning algorithms using CVaR with delayed feedback, analyzing their regret bounds and demonstrating improved performance in dynamic pricing scenarios.
Contribution
It introduces two zeroth-order risk-averse learning algorithms that handle delayed feedback and provides theoretical regret bounds considering delays.
Findings
Two algorithms achieve regret bounds similar to delay-free cases.
Two-point method outperforms one-point method in regret minimization.
Numerical experiments validate the effectiveness of the proposed algorithms.
Abstract
In real-world scenarios, risk-averse learning is valuable for mitigating potential adverse outcomes. However, the delayed feedback makes it challenging to assess and manage risk effectively. In this paper, we investigate risk-averse learning using Conditional Value at Risk (CVaR) as risk measure, while incorporating feedback with random but bounded delays. We develop two risk-averse learning algorithms that rely on one-point and two-point zeroth-order optimization approaches, respectively. The dynamic regrets of the algorithms are analyzed in terms of the cumulative delay and the number of total samplings. In the absence of delay, the regret bounds match the established bounds of zeroth-order stochastic gradient methods for risk-averse learning. Furthermore, the two-point risk-averse learning outperforms the one-point algorithm by achieving a smaller regret bound. We provide numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Distributed Sensor Networks and Detection Algorithms · Neural Networks and Applications
