How Hard is it to Confuse a World Model?

Waris Radji (Scool; CRIStAL); Odalric-Ambrym Maillard (Scool; CRIStAL)

arXiv:2510.21232·cs.LG·October 27, 2025

How Hard is it to Confuse a World Model?

Waris Radji (Scool, CRIStAL), Odalric-Ambrym Maillard (Scool, CRIStAL)

PDF

Open Access

TL;DR

This paper formalizes the challenge of constructing confusing neural network world models in reinforcement learning, proposing an adversarial training method to analyze model uncertainty and its implications for exploration strategies.

Contribution

It introduces a formal constrained optimization framework and an adversarial training approach to generate confusing models in deep RL, addressing an open problem in model construction.

Findings

01

Confusion degree correlates with model uncertainty.

02

Adversarial training effectively creates confusing models.

03

Insights may improve exploration strategies in deep RL.

Abstract

In reinforcement learning (RL) theory, the concept of most confusing instances is central to establishing regret lower bounds, that is, the minimal exploration needed to solve a problem. Given a reference model and its optimal policy, a most confusing instance is the statistically closest alternative model that makes a suboptimal policy optimal. While this concept is well-studied in multi-armed bandits and ergodic tabular Markov decision processes, constructing such instances remains an open question in the general case. In this paper, we formalize this problem for neural network world models as a constrained optimization: finding a modified model that is statistically close to the reference one, while producing divergent performance between optimal and suboptimal policies. We propose an adversarial training procedure to solve this problem and conduct an empirical study across world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)