Pontryagin-Guided Policy Optimization for Merton's Portfolio Problem
Jeonggyu Huh, Jaegi Jeon

TL;DR
This paper introduces a novel policy optimization framework for Merton's portfolio problem that integrates Pontryagin's maximum principle with neural network policies, improving convergence and interpretability.
Contribution
The paper proposes PG-DPO, a method that combines PMP with neural policies by tracking a policy-fixed BSDE, avoiding value function approximation, and enhancing training stability.
Findings
Effective handling of consumption and investment policies.
Achieves strong performance without large offline datasets.
Improves convergence speed and interpretability.
Abstract
We present a Pontryagin-Guided Direct Policy Optimization (PG-DPO) framework for Merton's portfolio problem, unifying modern neural-network-based policy parameterization with the adjoint viewpoint from Pontryagin's maximum principle (PMP). Instead of approximating the value function (as done in deep BSDE methods), we track a policy-fixed BSDE for the adjoint processes, which allows each gradient update to align with continuous-time PMP conditions. This setup yields locally optimal consumption and investment policies that are closely tied to classical stochastic control. We further incorporate an alignment penalty that nudges the learned policy toward Pontryagin-derived solutions, enhancing both convergence speed and training stability. Numerical experiments confirm that PG-DPO effectively handles both consumption and investment, achieving strong performance and interpretability without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Markets and Investment Strategies · Credit Risk and Financial Regulations · Monetary Policy and Economic Impact
