Orchestrated Value Mapping for Reinforcement Learning
Mehdi Fatemi, Arash Tavakoli

TL;DR
This paper introduces a unified framework for reinforcement learning algorithms that use value mapping and reward decomposition, enabling enhanced learning properties and generalizing existing methods.
Contribution
The paper presents a convergent class of RL algorithms based on value mapping and reward channels, broadening the scope of existing algorithms and providing a new convergence proof.
Findings
The proposed framework generalizes Q-Learning, Log Q-Learning, and Q-Decomposition.
A specific algorithm instantiated from the framework performs well on Atari games.
The convergence proof relaxes some assumptions of prior algorithms.
Abstract
We present a general convergent class of reinforcement learning algorithms that is founded on two distinct principles: (1) mapping value estimates to a different space using arbitrary functions from a broad class, and (2) linearly decomposing the reward signal into multiple channels. The first principle enables incorporating specific properties into the value estimator that can enhance learning. The second principle, on the other hand, allows for the value function to be represented as a composition of multiple utility functions. This can be leveraged for various purposes, e.g. dealing with highly varying reward scales, incorporating a priori knowledge about the sources of reward, and ensemble learning. Combining the two principles yields a general blueprint for instantiating convergent algorithms by orchestrating diverse mapping functions over multiple reward channels. This blueprint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsQ-Learning
