Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning
Bang You, Jingming Xie, Youping Chen, Jan Peters, Oleg Arenz

TL;DR
This paper introduces a sequential information bottleneck method for deep reinforcement learning that improves exploration efficiency and robustness in noisy environments by learning compressed, task-relevant representations of observations.
Contribution
It proposes a novel sequential information bottleneck objective with a variational upper bound, enhancing exploration in noisy, high-dimensional environments.
Findings
Achieves better sample efficiency in image-based control tasks.
Demonstrates robustness to sensor noise and complex backgrounds.
Outperforms curiosity and entropy-based methods.
Abstract
Effective exploration is critical for reinforcement learning agents in environments with sparse rewards or high-dimensional state-action spaces. Recent works based on state-visitation counts, curiosity and entropy-maximization generate intrinsic reward signals to motivate the agent to visit novel states for exploration. However, the agent can get distracted by perturbations to sensor inputs that contain novel but task-irrelevant information, e.g. due to sensor noise or changing background. In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations by modelling and compressing sequential predictive information in time-series observations. For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty based on the learned representations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
