Turning Text and Imagery into Captivating Visual Video
Mingming Wang, Elijah Miller

TL;DR
This paper presents a generative model-based approach for creating multi-perspective architectural videos from images and text, enhancing design visualization and communication.
Contribution
It introduces a novel application of generative models for architectural visualization, enabling multi-view and text-to-video synthesis from single images or descriptions.
Findings
Enables consistent multi-view architectural videos from single images
Generates design videos directly from textual descriptions
Improves speed and creativity in architectural visualization
Abstract
The ability to visualize a structure from multiple perspectives is crucial for comprehensive planning and presentation. This paper introduces an advanced application of generative models, akin to Stable Video Diffusion, tailored for architectural visualization. We explore the potential of these models to create consistent multi-perspective videos of buildings from single images and to generate design videos directly from textual descriptions. The proposed method enhances the design process by offering rapid prototyping, cost and time efficiency, and an enriched creative space for architects and designers. By harnessing the power of AI, our approach not only accelerates the visualization of architectural concepts but also enables a more interactive and immersive experience for clients and stakeholders. This advancement in architectural visualization represents a significant leap forward,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Video Analysis and Summarization · Multimedia Communication and Technology
