TL;DR
C2R is a generative rendering framework that synthesizes realistic urban crowd videos from coarse 3D simulations, enabling controllable and scalable scene generation with minimal input.
Contribution
The paper introduces a novel two-stage domain-hedging strategy that combines synthetic and real data to generate realistic, controllable videos from coarse simulations.
Findings
Supports coarse-to-fine control of scene generation.
Generalizes across diverse CG and game inputs.
Produces temporally consistent, realistic urban scene videos.
Abstract
Traditional rendering pipelines rely on complex assets, accurate materials and lighting, and substantial computational resources to produce realistic imagery, yet they still face challenges in scalability and realism for populated dynamic scenes. We present C2R (Coarse-to-Real), a generative rendering framework that synthesizes real-style urban crowd videos from coarse 3D simulations. Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories, while a learned neural renderer generates realistic appearance, lighting, and fine-scale dynamics guided by text prompts. To overcome the lack of paired training data between coarse simulations and real videos, we adopt a two-stage synthetic-real domain-hedging strategy that first learns a strong generative prior from large-scale real footage, and then introduces controllability by using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
