CIG: Exploration via Conditional Information Gain
Tim Joseph, Marcus Fechner, Philipp Stegmaier, Karam Daaboul, J. Marius Z\"ollner

TL;DR
This paper introduces Conditional Information Gain (CIG), a scalable, tractable intrinsic reward for reinforcement learning that combines lifelong and episodic exploration signals, improving performance across diverse tasks.
Contribution
The paper derives a novel CIG reward as a scalable surrogate for trajectory-level information gain, applicable to high-dimensional state spaces and model-based RL.
Findings
CIG outperforms or matches prior exploration methods across 12 tasks.
CIG remains robust in stochastic distractor environments.
CIG scales to high-dimensional state spaces with ensemble disagreement kernels.
Abstract
Intrinsic rewards for exploration in reinforcement learning condition on different contexts: lifelong rewards score each transition against accumulated experience but ignore within-rollout redundancy; episodic rewards penalize intra-trajectory repetition but discard lifetime progress. Hybrid methods combine both signals through heuristic weights or require Gaussian-process dynamics that do not scale beyond low-dimensional state spaces. Trajectory-level information gain decomposes into per-step terms that condition on the replay buffer and rollout prefix simultaneously, but remains intractable for deep models. We derive the Conditional Information Gain (CIG) reward as a tractable surrogate: a log-determinant objective over an ensemble disagreement kernel whose Cholesky factorization yields causal per-step rewards that retain both conditioning sets while scaling to high-dimensional state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
