Map2Video: Street View Imagery Driven AI Video Generation
Hye-Young Jo, Mose Sakashita, Aditi Mishra, Ryo Suzuki, Koichiro Niinuma, Aakar Gupta

TL;DR
Map2Video is a novel AI tool that leverages street view imagery and filmmaking practices to generate spatially consistent videos with improved control and realism, addressing current inconsistencies in AI video generation.
Contribution
The paper introduces Map2Video, integrating real-world geographies with AI to enable spatially accurate, controllable video creation inspired by filmmaking workflows.
Findings
Higher spatial accuracy compared to baseline methods
Reduced cognitive effort for users
Enhanced controllability for scene replication and creativity
Abstract
AI video generation has lowered barriers to video creation, but current tools still struggle with inconsistency. Filmmakers often find that clips fail to match characters and backgrounds, making it difficult to build coherent sequences. A formative study with filmmakers highlighted challenges in shot composition, character motion, and camera control. We present Map2Video, a street view imagery-driven AI video generation tool grounded in real-world geographies. The system integrates Unity and ComfyUI with the VACE video generation model, as well as OpenStreetMap and Mapillary for street view imagery. Drawing on familiar filmmaking practices such as location scouting and rehearsal, Map2Video enables users to choose map locations, position actors and cameras in street view imagery, sketch movement paths, refine camera motion, and generate spatially consistent videos. We evaluated Map2Video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
