PROWL: Prioritized Regret-Driven Optimization for World Model Learning

Ahmet H. G\"uzel; Jenny Seidenschwarz; Benjamin Graham; Jonathan Sadeghi; Jeffrey Hawke; Jack Parker-Holder; Ilija Bogunovic

arXiv:2605.18803·cs.LG·May 20, 2026

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

Ahmet H. G\"uzel, Jenny Seidenschwarz, Benjamin Graham, Jonathan Sadeghi, Jeffrey Hawke, Jack Parker-Holder, Ilija Bogunovic

PDF

TL;DR

PROWL introduces an adversarial curriculum and prioritized failure replay to enhance the robustness of world models in rare, critical scenarios, improving their reliability for downstream planning tasks.

Contribution

It presents a novel KL-constrained adversarial training method with a prioritized failure buffer to actively discover and learn from rare model failures.

Findings

01

PROWL improves robustness of world models on out-of-distribution trajectories.

02

It reveals reward-hacking behaviors under weak constraints.

03

Effective training depends on balancing failure discovery with behavioral regularization.

Abstract

Modern action-conditioned video world models achieve strong short-horizon visual realism, yet remain unreliable on rare, interaction-critical transitions that dominate downstream planning and policy performance. Because passive demonstration data systematically under-samples these high-impact regimes, improving robustness requires actively eliciting model failures rather than relying on their natural occurrence. We introduce a KL-constrained adversarial curriculum in which a policy is trained to expose high-error trajectories of a diffusion-based world model while remaining close to the behavior distribution. The world model is continuously fine-tuned on these adversarially discovered trajectories, yielding an adversarial training loop that converts rare failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation. To maintain pressure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.