Accelerating Model-Free Policy Optimization Using Model-Based Gradient:   A Composite Optimization Perspective

Yansong Li; Shuo Han

arXiv:2203.11424·math.OC·March 23, 2022·L4DC

Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective

Yansong Li, Shuo Han

PDF

Open Access

TL;DR

This paper introduces a hybrid optimization algorithm that combines model-based and model-free gradients to improve efficiency in solving nonlinear control problems with linear models plus small nonlinear perturbations.

Contribution

It formulates a composite optimization approach that leverages approximate linear models and black-box errors, reducing function evaluations in policy optimization.

Findings

01

Reduces function evaluations compared to traditional model-free methods.

02

Theoretically proven to improve optimization efficiency.

03

Practically validated on control problems with linear plus nonlinear perturbations.

Abstract

We develop an algorithm that combines model-based and model-free methods for solving a nonlinear optimal control problem with a quadratic cost in which the system model is given by a linear state-space model with a small additive nonlinear perturbation. We decompose the cost into a sum of two functions, one having an explicit form obtained from the approximate linear model, the other being a black-box model representing the unknown modeling error. The decomposition allows us to formulate the problem as a composite optimization problem. To solve the optimization problem, our algorithm performs gradient descent using the gradient obtained from the approximate linear model until backtracking line search fails, upon which the model-based gradient is compared with the exact gradient obtained from a model-free algorithm. The difference between the model gradient and the exact gradient is then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Control Systems Optimization · Machine Learning and Algorithms