Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Vitchyr Pong; Shixiang Gu; Murtaza Dalal; Sergey Levine

arXiv:1802.09081·cs.LG·February 25, 2020·44 cites

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Vitchyr Pong, Shixiang Gu, Murtaza Dalal, Sergey Levine

PDF

Open Access

TL;DR

Temporal Difference Models (TDMs) integrate model-free and model-based reinforcement learning to improve sample efficiency and asymptotic performance in continuous control tasks by leveraging rich transition information.

Contribution

Introduction of TDMs, goal-conditioned value functions trained with model-free methods that enable efficient model-based control, surpassing existing approaches.

Findings

01

TDMs significantly outperform state-of-the-art methods in continuous control tasks.

02

TDMs achieve higher sample efficiency than traditional model-free RL.

03

TDMs combine advantages of model-free and model-based RL for better performance.

Abstract

Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Energy Efficiency and Management · Adaptive Dynamic Programming Control