Self-supervised Sequential Information Bottleneck for Robust Exploration   in Deep Reinforcement Learning

Bang You; Jingming Xie; Youping Chen; Jan Peters; Oleg Arenz

arXiv:2209.05333·cs.LG·September 13, 2022

Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Bang You, Jingming Xie, Youping Chen, Jan Peters, Oleg Arenz

PDF

Open Access

TL;DR

This paper introduces a sequential information bottleneck method for deep reinforcement learning that improves exploration efficiency and robustness in noisy environments by learning compressed, task-relevant representations of observations.

Contribution

It proposes a novel sequential information bottleneck objective with a variational upper bound, enhancing exploration in noisy, high-dimensional environments.

Findings

01

Achieves better sample efficiency in image-based control tasks.

02

Demonstrates robustness to sensor noise and complex backgrounds.

03

Outperforms curiosity and entropy-based methods.

Abstract

Effective exploration is critical for reinforcement learning agents in environments with sparse rewards or high-dimensional state-action spaces. Recent works based on state-visitation counts, curiosity and entropy-maximization generate intrinsic reward signals to motivate the agent to visit novel states for exploration. However, the agent can get distracted by perturbations to sensor inputs that contain novel but task-irrelevant information, e.g. due to sensor noise or changing background. In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations by modelling and compressing sequential predictive information in time-series observations. For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty based on the learned representations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management