Policy Learning for Perturbance-wise Linear Quadratic Control Problem

Haoran Zhang; Wenhao Zhang; Xianping Wu

arXiv:2511.07388·math.OC·November 11, 2025

Policy Learning for Perturbance-wise Linear Quadratic Control Problem

Haoran Zhang, Wenhao Zhang, Xianping Wu

PDF

Open Access

TL;DR

This paper develops a unified perturbation-aware control framework combining classical, affine, and distributionally robust models, with a policy gradient method that converges globally and is validated on financial data.

Contribution

It introduces an augmented affine policy representation for perturbation-wise control, addressing model uncertainty and constraints in a unified manner.

Findings

01

Policy gradient method converges globally with constant stepsizes.

02

Numerical experiments demonstrate stable convergence and sensitivity tradeoffs.

03

The approach effectively handles noise and model uncertainty in control tasks.

Abstract

We study finite horizon linear quadratic control with additive noise in a perturbancewise framework that unifies the classical model, a constraint embedded affine policy class, and a distributionally robust formulation with a Wasserstein ambiguity set. Based on an augmented affine representation, we model feasibility as an affine perturbation and unknown noise as distributional perturbation from samples, thereby addressing constrained implementation and model uncertainty in a single scheme. First, we construct an implementable policy gradient method that accommodates nonzero noise means estimated from data. Second, we analyze its convergence under constant stepsizes chosen as simple polynomials of problem parameters, ensuring global decrease of the value function. Finally, numerical studies: mean variance portfolio allocation and dynamic benchmark tracking on real data, validating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics