Towards Generative Predictive Display for Vision-Based Teleoperation: A Zero-Shot Benchmark of Off-the-Shelf Video Models

Aws Khalil; Jaerock Kwon

arXiv:2605.09670·cs.RO·May 12, 2026

Towards Generative Predictive Display for Vision-Based Teleoperation: A Zero-Shot Benchmark of Off-the-Shelf Video Models

Aws Khalil, Jaerock Kwon

PDF

1 Repo

TL;DR

This paper benchmarks off-the-shelf generative video models for predictive display in teleoperation, revealing current limitations in real-time, low-error, short-horizon prediction without task-specific tuning.

Contribution

It introduces a zero-shot benchmarking pipeline for evaluating generative video models in teleoperation scenarios, highlighting the gap between general models and practical predictive display needs.

Findings

01

No model achieves low error, real-time inference, and stable predictions simultaneously.

02

Increasing model size or resolution offers limited or negative improvements.

03

Practical deployment requires adaptation or optimization beyond off-the-shelf models.

Abstract

Teleoperation systems are fundamentally limited by communication latency, which degrades situational awareness and control performance. Predictive display aims to mitigate this limitation by presenting an estimate of the current visual state rather than delayed observations. While recent advances in generative video models enable high-quality video synthesis, their suitability for latency-sensitive predictive display remains unclear. This paper presents a zero-shot benchmark of off-the-shelf generative video models for short-horizon predictive display, without task-specific fine-tuning. We formulate the problem as rollout-based future frame prediction and develop a unified benchmarking pipeline using simulated driving data from the CARLA simulator. Five publicly released video models spanning transformer-based and diffusion-based families are evaluated across two resolutions and two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bimilab.github.io/paper-GenPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.