Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Vitchyr Pong, Shixiang Gu, Murtaza Dalal, Sergey Levine

TL;DR
Temporal Difference Models (TDMs) integrate model-free and model-based reinforcement learning to improve sample efficiency and asymptotic performance in continuous control tasks by leveraging rich transition information.
Contribution
Introduction of TDMs, goal-conditioned value functions trained with model-free methods that enable efficient model-based control, surpassing existing approaches.
Findings
TDMs significantly outperform state-of-the-art methods in continuous control tasks.
TDMs achieve higher sample efficiency than traditional model-free RL.
TDMs combine advantages of model-free and model-based RL for better performance.
Abstract
Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Energy Efficiency and Management · Adaptive Dynamic Programming Control
