Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
Zhaofang Qian, Abolfazl Sharifi, Tucker Carroll, Ser-Nam Lim

TL;DR
Scene Co-pilot integrates large language models with procedural 3D scene generation to produce photorealistic videos with improved consistency and physical accuracy, enabling user-friendly, customizable scene creation.
Contribution
The paper introduces Scene Co-pilot, a novel framework combining LLMs with procedural 3D scene generation and human-in-the-loop control for improved video synthesis.
Findings
Effective scene customization demonstrated
Enhanced video quality with fewer artifacts
User control via Blender UI improves usability
Abstract
Video generation has achieved impressive quality, but it still suffers from artifacts such as temporal inconsistency and violation of physical laws. Leveraging 3D scenes can fundamentally resolve these issues by providing precise control over scene entities. To facilitate the easy generation of diverse photorealistic scenes, we propose Scene Copilot, a framework combining large language models (LLMs) with a procedural 3D scene generator. Specifically, Scene Copilot consists of Scene Codex, BlenderGPT, and Human in the loop. Scene Codex is designed to translate textual user input into commands understandable by the 3D scene generator. BlenderGPT provides users with an intuitive and direct way to precisely control the generated 3D scene and the final output video. Furthermore, users can utilize Blender UI to receive instant visual feedback. Additionally, we have curated a procedural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Data Visualization and Analytics
