ARCON: Advancing Auto-Regressive Continuation for Driving Videos

Ruibo Ming; Jingwei Wu; Zhewei Huang; Zhuoxuan Ju; Jianming HU; Lihui; Peng; Shuchang Zhou

arXiv:2412.03758·cs.CV·February 27, 2025

ARCON: Advancing Auto-Regressive Continuation for Driving Videos

Ruibo Ming, Jingwei Wu, Zhewei Huang, Zhuoxuan Ju, Jianming HU, Lihui, Peng, Shuchang Zhou

PDF

Open Access

TL;DR

ARCON introduces a novel approach for driving video continuation by alternating semantic and RGB token generation in large vision models, resulting in high consistency and long video generation in autonomous driving scenarios.

Contribution

The paper presents ARCON, a new scheme that improves video continuation by explicitly learning high-level structure through token alternation and optical flow-based enhancement.

Findings

01

High consistency in generated RGB images and semantic maps.

02

Effective long video generation in autonomous driving scenarios.

03

Enhanced visual quality through optical flow-based stitching.

Abstract

Recent advancements in auto-regressive large language models (LLMs) have led to their application in video generation. This paper explores the use of Large Vision Models (LVMs) for video continuation, a task essential for building world models and predicting future frames. We introduce ARCON, a scheme that alternates between generating semantic and RGB tokens, allowing the LVM to explicitly learn high-level structural video information. We find high consistency in the RGB images and semantic maps generated without special design. Moreover, we employ an optical flow-based texture stitching method to enhance visual quality. Experiments in autonomous driving scenarios show that our model can consistently generate long videos.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Advanced Vision and Imaging · Advanced Image Processing Techniques