Foresee then Evaluate: Decomposing Value Estimation with Latent Future   Prediction

Hongyao Tang; Jianye Hao; Guangyong Chen; Pengfei Chen; Chen Chen,; Yaodong Yang; Luo Zhang; Wulong Liu; Zhaopeng Meng

arXiv:2103.02225·cs.LG·March 4, 2021·1 cites

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Hongyao Tang, Jianye Hao, Guangyong Chen, Pengfei Chen, Chen Chen,, Yaodong Yang, Luo Zhang, Wulong Liu, Zhaopeng Meng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces VDFP, a novel RL value estimation method that explicitly models latent future dynamics and returns, improving performance in continuous control tasks with sparse and delayed rewards.

Contribution

It proposes a two-step value estimation framework with explicit future prediction and decomposes value into dynamics and return components, along with a practical deep RL algorithm.

Findings

01

Effective in continuous control tasks

02

Improves handling of delayed rewards

03

Outperforms baseline methods in experiments

Abstract

Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bluecontra/AAAI2021-VDFP
tfOfficial

Videos

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Mental Health Research Topics