DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing
Vint Lee, Pieter Abbeel, Youngwoon Lee

TL;DR
DreamSmooth enhances model-based reinforcement learning by predicting smoothed rewards, which mitigates the challenge of sparse reward prediction, leading to improved sample efficiency and performance on complex tasks.
Contribution
The paper introduces DreamSmooth, a novel reward smoothing technique that improves reward prediction in MBRL, especially for sparse rewards, achieving state-of-the-art results.
Findings
State-of-the-art performance on sparse-reward tasks
Improved sample efficiency in long-horizon tasks
No loss in performance on standard benchmarks
Abstract
Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Motivated by the intuition that humans can learn from rough reward estimates, we propose a simple yet effective reward smoothing approach, DreamSmooth, which learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep. We empirically show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks both in sample efficiency and final performance without losing performance on common benchmarks, such as Deepmind Control Suite and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
