Network-Efficient World Model Token Streaming
Shatadal Mishra, Ahmadreza Moradipari, Nejib Ammar

TL;DR
This paper presents a network-efficient streaming method for discrete world model states in autonomous driving, improving synchronization and downstream utility under bandwidth and packet loss constraints.
Contribution
It introduces an online, label-free delta update algorithm with adaptive keyframe triggering, enhancing rate-distortion performance over periodic methods.
Findings
Delta updates reduce distortion at low bitrates.
Adaptive keyframe triggering improves rate-distortion trade-offs.
Streamed states enhance downstream token prediction accuracy.
Abstract
Generative driving world models rely on compact latent state representations that must be efficiently transmitted and synchronized across distributed compute and connected vehicles. We study network-efficient streaming of a discrete world model state, where a stride-16 VQ-U-Net tokenizer (codebook size 8,192) maps each 288x512 frame to an 18x32 grid of token IDs (576 tokens/frame), equivalent to 936 bytes/frame under fixed-length coding. We consider a keyframe--delta protocol under strict per-message payload budgets and packet loss, and propose a fully online, label-free algorithm that prioritizes delta updates via cosine distance in codebook embedding space and triggers keyframes adaptively using a Hamming-drift threshold. The adaptive algorithm consistently improves the rate distortion frontier over periodic keyframes at matched bitrates: at 0.024 Mb/s (200-byte budget) dynamic-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
