Procedural Image Programs for Representation Learning
Manel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio, Feris, Antonio Torralba, Phillip Isola

TL;DR
This paper introduces a large dataset of 21,000 procedural image programs for synthetic data generation, enabling scalable and diverse training for neural networks in representation learning, reducing reliance on real images.
Contribution
It presents a novel large-scale dataset of procedural image programs that simplifies and accelerates synthetic data generation for training neural networks.
Findings
Reduces the gap between real and synthetic pre-training by 38%
Enables both supervised and unsupervised learning with synthetic data
Uses short, modifiable code snippets for fast image generation
Abstract
Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias. Existing work focuses on a handful of curated generative processes which require expert knowledge to design, making it hard to scale up. To overcome this, we propose training with a large dataset of twenty-one thousand programs, each one generating a diverse set of synthetic images. These programs are short code snippets, which are easy to modify and fast to execute using OpenGL. The proposed dataset can be used for both supervised and unsupervised representation learning, and reduces the gap between pre-training with real and procedurally generated images by 38%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · AI in cancer detection
