TL;DR
This paper introduces a discrete-time, RL-based option pricing model rooted in the Black-Scholes framework, enabling model-free learning of prices and hedges directly from data, bridging finance and reinforcement learning.
Contribution
It develops a risk-adjusted Markov Decision Process for option pricing using Q-Learning, connecting classical finance models with modern RL techniques and enabling model-free learning.
Findings
Model can learn option prices and hedges directly from data.
The approach bridges classical finance and RL, enabling model-free pricing.
The model is simple, computationally efficient, and suitable for benchmarking RL algorithms.
Abstract
This paper presents a discrete-time option pricing model that is rooted in Reinforcement Learning (RL), and more specifically in the famous Q-Learning method of RL. We construct a risk-adjusted Markov Decision Process for a discrete-time version of the classical Black-Scholes-Merton (BSM) model, where the option price is an optimal Q-function, while the optimal hedge is a second argument of this optimal Q-function, so that both the price and hedge are parts of the same formula. Pricing is done by learning to dynamically optimize risk-adjusted returns for an option replicating portfolio, as in the Markowitz portfolio theory. Using Q-Learning and related methods, once created in a parametric setting, the model is able to go model-free and learn to price and hedge an option directly from data, and without an explicit model of the world. This suggests that RL may provide efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
