ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling

Jiayi Zhu; Jianing Zhang; Yiying Yang; Wei Cheng; Xiaoyun Yuan

arXiv:2603.02697·cs.CV·March 4, 2026

ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling

Jiayi Zhu, Jianing Zhang, Yiying Yang, Wei Cheng, Xiaoyun Yuan

PDF

Open Access

TL;DR

ShareVerse introduces a multi-agent video generation framework that models shared worlds with consistent multi-view and spatial-temporal coherence, leveraging large video models and a new multi-agent dataset.

Contribution

It presents a novel multi-view spatial concatenation strategy and cross-agent attention integration for consistent shared world modeling in multi-agent scenarios.

Findings

01

Supports 49-frame large-scale video generation.

02

Achieves accurate dynamic agent positioning.

03

Ensures shared world consistency across agents.

Abstract

This paper presents ShareVerse, a video generation framework enabling multi-agent shared world modeling, addressing the gap in existing works that lack support for unified shared world construction with multi-agent interaction. ShareVerse leverages the generation capability of large video models and integrates three key innovations: 1) A dataset for large-scale multi-agent interactive world modeling is built on the CARLA simulation platform, featuring diverse scenes, weather conditions, and interactive trajectories with paired multi-view videos (front/ rear/ left/ right views per agent) and camera data. 2) We propose a spatial concatenation strategy for four-view videos of independent agents to model a broader environment and to ensure internal multi-view geometric consistency. 3) We integrate cross-agent attention blocks into the pretrained video model, which enable interactive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Multimodal Machine Learning Applications