Parameter-Based Value Functions
Francesco Faccio, Louis Kirsch, J\"urgen Schmidhuber

TL;DR
This paper introduces Parameter-Based Value Functions (PBVFs) that incorporate policy parameters, enabling generalization across policies and zero-shot policy learning, with algorithms evaluated on control tasks showing competitive performance.
Contribution
The paper proposes PBVFs that include policy parameters as inputs, allowing for policy generalization and zero-shot learning, which is a novel approach in off-policy RL.
Findings
PBVFs enable evaluation of multiple policies with a single function.
Off-policy algorithms based on PBVFs perform comparably to state-of-the-art methods.
PBVFs facilitate zero-shot learning of new policies outperforming training policies.
Abstract
Traditional off-policy actor-critic Reinforcement Learning (RL) algorithms learn value functions of a single target policy. However, when value functions are updated to track the learned policy, they forget potentially useful information about old policies. We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters. They can generalize across different policies. PBVFs can evaluate the performance of any policy given a state, a state-action pair, or a distribution over the RL agent's initial states. First we show how PBVFs yield novel off-policy policy gradient theorems. Then we derive off-policy actor-critic algorithms based on PBVFs trained by Monte Carlo or Temporal Difference methods. We show how learned PBVFs can zero-shot learn new policies that outperform any policy seen during training. Finally our algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
