Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

Roussel Desmond Nzoyem; Mauro Comi

arXiv:2605.06298·cs.CV·May 11, 2026

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

Roussel Desmond Nzoyem, Mauro Comi

PDF

TL;DR

NOVA introduces a novel world modeling framework using implicit neural representations that enables efficient, interpretable, and disentangled video modeling without heavy decoders, facilitating controllable forecasting and editing.

Contribution

The paper presents NOVA, a structured INR-based world model that disentangles scene components and renders representations analytically, reducing computational costs and improving interpretability.

Findings

01

NOVA achieves strong controllable forecasting on challenging datasets.

02

The model can disentangle background, foreground, and motion without auxiliary losses.

03

NOVA operates efficiently on a single consumer GPU with approximately 40 million parameters.

Abstract

Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves these models computationally expensive and uninterpretable. We address this problem by introducing NOVA, a world modelling framework that represents the system state as the weights and biases of an auxiliary coordinate-based implicit neural representation (INR). This structured representation is analytically rendered, which eliminates the decoder bottleneck while conferring compactness, portability, and zero-shot super-resolution. Furthermore, like most latent action models, NOVA can be distilled into a context-dependent video generator via an action-matching objective. Surprisingly, without resorting to auxiliary losses or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.