What Should I Know? Using Meta-gradient Descent for Predictive Feature   Discovery in a Single Stream of Experience

Alexandra Kearney; Anna Koop; Johannes G\"unther; Patrick M. Pilarski

arXiv:2206.06485·cs.LG·June 15, 2022

What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience

Alexandra Kearney, Anna Koop, Johannes G\"unther, Patrick M. Pilarski

PDF

Open Access

TL;DR

This paper introduces a meta-gradient descent method enabling reinforcement learning agents to autonomously select and learn useful predictive features, specifically General Value Functions, during continual interaction with the environment.

Contribution

It presents a novel meta-gradient approach that allows agents to learn what predictions to make, how to estimate them, and how to use them for decision-making in a single continual learning process.

Findings

01

Agents can independently select predictions that resolve partial-observability.

02

Performance comparable to manually specified GVFs achieved through autonomous prediction learning.

03

Enables self-supervised identification of useful predictions for improved decision-making.

Abstract

In computational reinforcement learning, a growing body of work seeks to construct an agent's perception of the world through predictions of future sensations; predictions about environment observations are used as additional input features to enable better goal-directed decision-making. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. This challenge is especially apparent in continual learning problems where a single stream of experience is available to a singular agent. As a primary contribution, we introduce a meta-gradient descent process by which an agent learns 1) what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward -- all during a single ongoing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Single-cell and spatial transcriptomics