Orchestrated Value Mapping for Reinforcement Learning

Mehdi Fatemi; Arash Tavakoli

arXiv:2203.07171·cs.LG·March 18, 2022·1 cites

Orchestrated Value Mapping for Reinforcement Learning

Mehdi Fatemi, Arash Tavakoli

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a unified framework for reinforcement learning algorithms that use value mapping and reward decomposition, enabling enhanced learning properties and generalizing existing methods.

Contribution

The paper presents a convergent class of RL algorithms based on value mapping and reward channels, broadening the scope of existing algorithms and providing a new convergence proof.

Findings

01

The proposed framework generalizes Q-Learning, Log Q-Learning, and Q-Decomposition.

02

A specific algorithm instantiated from the framework performs well on Atari games.

03

The convergence proof relaxes some assumptions of prior algorithms.

Abstract

We present a general convergent class of reinforcement learning algorithms that is founded on two distinct principles: (1) mapping value estimates to a different space using arbitrary functions from a broad class, and (2) linearly decomposing the reward signal into multiple channels. The first principle enables incorporating specific properties into the value estimator that can enhance learning. The second principle, on the other hand, allows for the value function to be represented as a composition of multiple utility functions. This can be leveraged for various purposes, e.g. dealing with highly varying reward scales, incorporating a priori knowledge about the sources of reward, and ensemble learning. Combining the two principles yields a general blueprint for instantiating convergent algorithms by orchestrating diverse mapping functions over multiple reward channels. This blueprint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/orchestrated-value-mapping
tfOfficial

Videos

Orchestrated Value Mapping for Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsQ-Learning