Two Kinds of Learning Algorithms for Continuous-Time VWAP Targeting Execution
Xingyu Zhou, Wenbin Chen, Mingyu Xu

TL;DR
This paper develops and compares reinforcement learning and adaptive dynamic programming algorithms for continuous-time VWAP trading execution, providing theoretical guarantees and empirical validation across different market environments.
Contribution
It introduces a relaxed stochastic optimization framework with an explicit Gaussian optimal policy and extends RL to jump processes, with convergence proofs and practical algorithms.
Findings
Both RL and ADP algorithms converge in tested environments.
ADP performs better with strong price impact environments.
RL algorithms learn directly from interactions without model assumptions.
Abstract
The optimal execution problem has always been a continuously focused research issue, and many reinforcement learning (RL) algorithms have been studied. In this article, we consider the execution problem of targeting the volume weighted average price (VWAP) and propose a relaxed stochastic optimization problem with an entropy regularizer to encourage more exploration. We derive the explicit formula of the optimal policy, which is Gaussian distributed, with its mean value being the solution to the original problem. Extending the framework of continuous RL to processes with jumps, we provide some theoretical proofs for RL algorithms. First, minimizing the martingale loss function leads to the optimal parameter estimates in the mean-square sense, and the second algorithm is to use the martingale orthogonality condition. In addition to the RL algorithm, we also propose another learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Advanced Control Systems Optimization · Real-Time Systems Scheduling
