Scene Co-pilot: Procedural Text to Video Generation with Human in the   Loop

Zhaofang Qian; Abolfazl Sharifi; Tucker Carroll; Ser-Nam Lim

arXiv:2411.18644·cs.CV·December 2, 2024

Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop

Zhaofang Qian, Abolfazl Sharifi, Tucker Carroll, Ser-Nam Lim

PDF

Open Access

TL;DR

Scene Co-pilot integrates large language models with procedural 3D scene generation to produce photorealistic videos with improved consistency and physical accuracy, enabling user-friendly, customizable scene creation.

Contribution

The paper introduces Scene Co-pilot, a novel framework combining LLMs with procedural 3D scene generation and human-in-the-loop control for improved video synthesis.

Findings

01

Effective scene customization demonstrated

02

Enhanced video quality with fewer artifacts

03

User control via Blender UI improves usability

Abstract

Video generation has achieved impressive quality, but it still suffers from artifacts such as temporal inconsistency and violation of physical laws. Leveraging 3D scenes can fundamentally resolve these issues by providing precise control over scene entities. To facilitate the easy generation of diverse photorealistic scenes, we propose Scene Copilot, a framework combining large language models (LLMs) with a procedural 3D scene generator. Specifically, Scene Copilot consists of Scene Codex, BlenderGPT, and Human in the loop. Scene Codex is designed to translate textual user input into commands understandable by the 3D scene generator. BlenderGPT provides users with an intuitive and direct way to precisely control the generated 3D scene and the final output video. Furthermore, users can utilize Blender UI to receive instant visual feedback. Additionally, we have curated a procedural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Data Visualization and Analytics