Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning
Loris Cannelli, Giuseppe Nuti, Marzio Sala, Oleg Szehr

TL;DR
This paper compares reinforcement learning approaches for hedging in financial markets, showing that a contextual $k$-armed bandit model offers more realistic, sample-efficient, and adaptable strategies than traditional $Q$-learning, especially in real-world data scenarios.
Contribution
It introduces a risk-averse contextual $k$-armed bandit framework for hedging, demonstrating its advantages over $Q$-learning and traditional models in practical financial settings.
Findings
The $k$-armed bandit model outperforms $Q$-learning in sample efficiency.
The approach aligns with the Profit and Loss hedging framework.
It reduces to Black-Scholes in idealized conditions.
Abstract
The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but it is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention was given to Recurrent Neural Network systems and variations of the -learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent was trained solely on simulated data, the run-time performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
