Portfolio Optimization under Recursive Utility via Reinforcement Learning
Minkey Chang

TL;DR
This paper explores using recursive utility, a risk-sensitive objective, within reinforcement learning for portfolio optimization, demonstrating improved financial metrics over baseline methods on real ETF data.
Contribution
It introduces a novel recursive utility-based reinforcement learning framework with Monte Carlo approximation and actor-critic algorithms for portfolio management.
Findings
Recursive utility improves Sharpe ratio, max drawdown, and cumulative return.
The method outperforms naive baseline across multiple data splits.
The approach is validated on South Korean ETF data.
Abstract
We study whether a risk-sensitive objective from asset-pricing theory -- recursive utility -- improves reinforcement learning for portfolio allocation. The Bellman equation under recursive utility involves a certainty equivalent (CE) of future value that has no closed form under observed returns; we approximate it by -sample Monte Carlo and train actor-critic (PPO, A2C) on the resulting value target and an approximate advantage estimate (AAE) that generalizes the Bellman residual to multi-step with state-dependent weights. This formulation applies only to critic-based algorithms. On 10 chronological train/test splits of South Korean ETF data, the recursive-utility agent improves on the discounted (naive) baseline in Sharpe ratio, max drawdown, and cumulative return. Derivations, world model and metrics, and full result tables are in the appendices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Advanced Bandit Algorithms Research · Financial Markets and Investment Strategies
