TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

H\'ector Carri\'on; Yutong Bai; V\'ictor A. Hern\'andez Castro; Kishan Panaganti; Ayush Zenith; Matthew Trang; Tony Zhang; Pietro Perona; Jitendra Malik

arXiv:2506.11302·cs.CV·June 23, 2025

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

H\'ector Carri\'on, Yutong Bai, V\'ictor A. Hern\'andez Castro, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik

PDF

Open Access 1 Datasets

TL;DR

This paper introduces STRIDE, a comprehensive spatio-temporal road image dataset, and TARDIS, a transformer-based world model that effectively captures environment dynamics for autonomous agent tasks.

Contribution

It presents a novel dataset and a unified autoregressive transformer model for modeling complex spatio-temporal environment dynamics in autonomous systems.

Findings

01

Robust performance in image synthesis and instruction following.

02

State-of-the-art results in georeferencing tasks.

03

Demonstrates potential for generalist autonomous agents.

Abstract

World models aim to simulate environments and enable effective agent behavior. However, modeling real-world environments presents unique challenges as they dynamically change across both space and, crucially, time. To capture these composed dynamics, we introduce a Spatio-Temporal Road Image Dataset for Exploration (STRIDE) permuting 360-degree panoramic imagery into rich interconnected observation, state and action nodes. Leveraging this structure, we can simultaneously model the relationship between egocentric views, positional coordinates, and movement commands across both space and time. We benchmark this dataset via TARDIS, a transformer-based generative world model that integrates spatial and temporal dynamics through a unified autoregressive framework trained on STRIDE. We demonstrate robust performance across a range of agentic tasks such as controllable photorealistic image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Tera-AI/STRIDE
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications