Approximation Benefits of Policy Gradient Methods with Aggregated States

Daniel Russo

arXiv:2007.11684·cs.LG·June 24, 2022·1 cites

Approximation Benefits of Policy Gradient Methods with Aggregated States

Daniel Russo

PDF

Open Access

TL;DR

This paper demonstrates that policy gradient methods with state aggregation can converge to policies with bounded regret, offering robustness over approximate policy iteration and value iteration in the presence of approximation errors.

Contribution

It provides theoretical analysis showing policy gradient methods with state aggregation have bounded regret, unlike approximate policy iteration and value iteration.

Findings

01

Policy gradient converges with bounded regret proportional to the maximum value difference within partitions.

02

Approximate policy iteration and value iteration have regret scaling with the inverse of (1 - gamma).

03

Methods optimizing the true decision-objective locally are more robust to approximation errors.

Abstract

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, where the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per-period is bounded by $ϵ$ , the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as $ϵ / (1 - γ)$ , where $γ$ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision-objective can be far more robust.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods