Adversarial Deep Reinforcement Learning in Portfolio Management
Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, Yanran Li

TL;DR
This paper evaluates three deep reinforcement learning algorithms for portfolio management, introduces an adversarial training method, and demonstrates that policy gradient methods outperform others and UCRP in Chinese stock market simulations.
Contribution
It implements and compares DDPG, PPO, and PG algorithms in portfolio management, introduces an adversarial training method, and shows PG with adversarial training outperforms traditional methods and UCRP.
Findings
PG outperforms DDPG and PPO in Chinese stock market.
Adversarial training improves training efficiency and financial metrics.
Policy Gradient with adversarial training surpasses UCRP in back tests.
Abstract
In this paper, we implement three state-of-art continuous reinforcement learning algorithms, Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO) and Policy Gradient (PG)in portfolio management. All of them are widely-used in game playing and robot control. What's more, PPO has appealing theoretical propeties which is hopefully potential in portfolio management. We present the performances of them under different settings, including different learning rates, objective functions, feature combinations, in order to provide insights for parameters tuning, features selection and data preparation. We also conduct intensive experiments in China Stock market and show that PG is more desirable in financial market than DDPG and PPO, although both of them are more advanced. What's more, we propose a so called Adversarial Training method and show that it can greatly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Advanced Bandit Algorithms Research
MethodsExperience Replay · Entropy Regularization · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient · Proximal Policy Optimization
