Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation
Chung I Lu

TL;DR
This paper evaluates deep reinforcement learning algorithms for portfolio optimization using simulated data, highlighting the strengths of on-policy methods like PPO in noisy environments and the challenges faced by off-policy algorithms.
Contribution
It provides a comprehensive comparison of RL algorithms in a simulated portfolio setting, introducing regime-aware policies and analyzing sample complexity issues.
Findings
PPO and A2C outperform DDPG, TD3, and SAC in noisy reward environments.
Clipping in PPO prevents policy deviation from optimal.
Regime-aware PPO adapts to market changes effectively.
Abstract
We evaluate benchmark deep reinforcement learning algorithms on the task of portfolio optimisation using simulated data. The simulator to generate the data is based on correlated geometric Brownian motion with the Bertsimas-Lo market impact model. Using the Kelly criterion (log utility) as the objective, we can analytically derive the optimal policy without market impact as an upper bound to measure performance when including market impact. We find that the off-policy algorithms DDPG, TD3 and SAC are unable to learn the right -function due to the noisy rewards and therefore perform poorly. The on-policy algorithms PPO and A2C, with the use of generalised advantage estimation, are able to deal with the noise and derive a close to optimal policy. The clipping variant of PPO was found to be important in preventing the policy from deviating from the optimal once converged. In a more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Financial Markets and Investment Strategies · Reservoir Engineering and Simulation Methods
