FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving

Yaoru Li; Federico Landi; Marco Godi; Xin Jin; Ruiju Fu; Yufei Ma; Muyang Sun; Heyu Si; Qi Guo

arXiv:2603.14938·cs.CV·March 17, 2026

FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving

Yaoru Li, Federico Landi, Marco Godi, Xin Jin, Ruiju Fu, Yufei Ma, Muyang Sun, Heyu Si, Qi Guo

PDF

Open Access

TL;DR

FAR-Drive introduces a novel autoregressive video generation framework for autonomous driving simulation, addressing long-term consistency, multi-view generation, and low-latency inference, enabling more reliable and interactive training environments.

Contribution

The paper presents a multi-view diffusion transformer with structured control and a two-stage training strategy to improve closed-loop autonomous driving simulation.

Findings

01

Achieves state-of-the-art performance on nuScenes dataset

02

Maintains sub-second latency on a single GPU

03

Enhances long-horizon consistency and robustness in simulation

Abstract

Despite rapid progress in autonomous driving, reliable training and evaluation of driving systems remain fundamentally constrained by the lack of scalable and interactive simulation environments. Recent generative video models achieve remarkable visual fidelity, yet most operate in open-loop settings and fail to support fine-grained frame-level interaction between agent actions and environment evolution. Building a learning-based closed-loop simulator for autonomous driving poses three major challenges: maintaining long-horizon temporal and cross-view consistency, mitigating autoregressive degradation under iterative self-conditioning, and satisfying low-latency inference constraints. In this work, we propose FAR-Drive, a frame-level autoregressive video generation framework for autonomous driving. We introduce a multi-view diffusion transformer with fine-grained structured control,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Advanced Vision and Imaging