Portfolio Reinforcement Learning with Scenario-Context Rollout
Vanya Priscillia Bendatu, Yao Lu

TL;DR
This paper introduces a scenario-context rollout method for portfolio reinforcement learning that generates stress-test scenarios to improve policy stability and performance during market regime shifts, significantly enhancing Sharpe ratios and reducing drawdowns.
Contribution
It proposes a novel counterfactual rollout approach to stabilize RL critic training and effectively incorporate stress scenarios in portfolio management.
Findings
Up to 76% improvement in Sharpe ratio
Up to 53% reduction in maximum drawdown
Effective across 31 diverse market universes
Abstract
Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Advanced Bandit Algorithms Research · Financial Markets and Investment Strategies
