CASA: Bridging the Gap between Policy Improvement and Policy Evaluation   with Conflict Averse Policy Iteration

Changnan Xiao; Haosen Shi; Jiajun Fan; Shihong Deng; Haiyan Yin

arXiv:2105.03923·cs.LG·February 28, 2023

CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration

Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, Haiyan Yin

PDF

Open Access

TL;DR

This paper introduces a conflict-averse policy iteration method that regularizes the inconsistency between policy evaluation and improvement in model-free reinforcement learning, leading to improved performance and reduced approximation errors.

Contribution

It proposes a novel regularization approach that aligns policy evaluation with policy improvement, bridging the gap in generalized policy iteration and enhancing learning stability.

Findings

01

Outperforms strong baselines on Arcade Learning Environment

02

Reduces functional approximation error in policy iteration

03

Prevents policies from being trapped in suboptimal solutions

Abstract

We study the problem of model-free reinforcement learning, which is often solved following the principle of Generalized Policy Iteration (GPI). While GPI is typically an interplay between policy evaluation and policy improvement, most conventional model-free methods assume the independence of the granularity and other details of the GPI steps, despite of the inherent connections between them. In this paper, we present a method that regularizes the inconsistency between policy evaluation and policy improvement, leading to a conflict averse GPI solution with reduced functional approximation error. To this end, we formulate a novel learning paradigm where taking the policy evaluation step is equivalent to some compensation of performing policy improvement, and thus effectively alleviates the gradient conflict between the two GPI steps. We also show that the form of our proposed solution is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Domain Adaptation and Few-Shot Learning