Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization
Xiaodong Li, Pangjing Wu, Chenxin Zou, Qing Li

TL;DR
This paper introduces a hierarchical deep reinforcement learning framework called M3T for optimizing VWAP trading strategies, effectively reducing transaction costs by capturing multi-scale market patterns.
Contribution
It proposes a novel hierarchical architecture combining macro, meta, and micro traders with LSTM forecasting to improve VWAP strategy performance.
Findings
Outperforms baseline strategies in VWAP slippage reduction
Achieves an average cost saving of 1.16 basis points
Demonstrates effectiveness on Shanghai stock exchange data
Abstract
Designing an intelligent volume-weighted average price (VWAP) strategy is a critical concern for brokers, since traditional rule-based strategies are relatively static that cannot achieve a lower transaction cost in a dynamic market. Many studies have tried to minimize the cost via reinforcement learning, but there are bottlenecks in improvement, especially for long-duration strategies such as the VWAP strategy. To address this issue, we propose a deep learning and hierarchical reinforcement learning jointed architecture termed Macro-Meta-Micro Trader (M3T) to capture market patterns and execute orders from different temporal scales. The Macro Trader first allocates a parent order into tranches based on volume profiles as the traditional VWAP strategy does, but a long short-term memory neural network is used to improve the forecasting accuracy. Then the Meta Trader selects a short-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Complex Systems and Time Series Analysis
MethodsBalanced Selection
