Parameter-Based Value Functions

Francesco Faccio; Louis Kirsch; J\"urgen Schmidhuber

arXiv:2006.09226·cs.LG·August 16, 2021·1 cites

Parameter-Based Value Functions

Francesco Faccio, Louis Kirsch, J\"urgen Schmidhuber

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Parameter-Based Value Functions (PBVFs) that incorporate policy parameters, enabling generalization across policies and zero-shot policy learning, with algorithms evaluated on control tasks showing competitive performance.

Contribution

The paper proposes PBVFs that include policy parameters as inputs, allowing for policy generalization and zero-shot learning, which is a novel approach in off-policy RL.

Findings

01

PBVFs enable evaluation of multiple policies with a single function.

02

Off-policy algorithms based on PBVFs perform comparably to state-of-the-art methods.

03

PBVFs facilitate zero-shot learning of new policies outperforming training policies.

Abstract

Traditional off-policy actor-critic Reinforcement Learning (RL) algorithms learn value functions of a single target policy. However, when value functions are updated to track the learned policy, they forget potentially useful information about old policies. We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters. They can generalize across different policies. PBVFs can evaluate the performance of any policy given a state, a state-action pair, or a distribution over the RL agent's initial states. First we show how PBVFs yield novel off-policy policy gradient theorems. Then we derive off-policy actor-critic algorithms based on PBVFs trained by Monte Carlo or Temporal Difference methods. We show how learned PBVFs can zero-shot learn new policies that outperform any policy seen during training. Finally our algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ff93/parameter-based-value-functions
pytorchOfficial

Videos

Parameter-Based Value Functions· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques