Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars
Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin,, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

TL;DR
This paper introduces a learning-based system that generates diverse, photorealistic 3D indoor scenes and 2D images with detailed ground truth, aiding training and evaluation of computer vision models.
Contribution
It presents a novel pipeline using stochastic grammars and physics-based rendering to produce customizable, high-quality synthetic datasets with per-pixel ground truth for scene understanding tasks.
Findings
Enhanced depth and normal prediction accuracy
Improved semantic segmentation performance
Provided controllable benchmarks for model diagnostics
Abstract
We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, we devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. Our pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
See pages 1-last of scenesynthesis2018ijcv.pdf
