Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training
Vin Bhaskara, Haicheng Wang

TL;DR
Curiosity-Critic introduces a novel intrinsic reward based on cumulative prediction error improvement, enabling more effective exploration and world model training by distinguishing learnable from stochastic transitions.
Contribution
It proposes a tractable surrogate for cumulative prediction error improvement using a learned critic, enhancing exploration efficiency without requiring oracle knowledge.
Findings
Outperforms existing curiosity methods in stochastic grid world experiments.
Effectively separates reducible from irreducible prediction errors online.
Accelerates training speed and improves final world model accuracy.
Abstract
Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the error baseline for stochastic ones, effectively separating epistemic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
