DreamSmooth: Improving Model-based Reinforcement Learning via Reward   Smoothing

Vint Lee; Pieter Abbeel; Youngwoon Lee

arXiv:2311.01450·cs.LG·February 20, 2024·1 cites

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

Vint Lee, Pieter Abbeel, Youngwoon Lee

PDF

Open Access 1 Video

TL;DR

DreamSmooth enhances model-based reinforcement learning by predicting smoothed rewards, which mitigates the challenge of sparse reward prediction, leading to improved sample efficiency and performance on complex tasks.

Contribution

The paper introduces DreamSmooth, a novel reward smoothing technique that improves reward prediction in MBRL, especially for sparse rewards, achieving state-of-the-art results.

Findings

01

State-of-the-art performance on sparse-reward tasks

02

Improved sample efficiency in long-horizon tasks

03

No loss in performance on standard benchmarks

Abstract

Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Motivated by the intuition that humans can learn from rough reward estimates, we propose a simple yet effective reward smoothing approach, DreamSmooth, which learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep. We empirically show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks both in sample efficiency and final performance without losing performance on common benchmarks, such as Deepmind Control Suite and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics