Geometrically Consistent Multi-View Scene Generation from Freehand Sketches
Ahmed Bourouis, Savas Ozkan, Andrea Maracani, Yi-Zhe Song, Mete Ozay

TL;DR
This paper introduces a novel method for generating multi-view 3D scenes from single freehand sketches, overcoming challenges of geometric distortion and lack of training data.
Contribution
It presents a new dataset, a transformer-based architecture with geometric inductive biases, and a supervision loss derived from structure-from-motion, enabling single-pass multi-view scene synthesis from sketches.
Findings
Outperforms state-of-the-art baselines in realism and geometric consistency.
Achieves over 60% improvement in FID score.
Provides up to 3.7× faster inference speed.
Abstract
We tackle a new problem: generating geometrically consistent multi-view scenes from a single freehand sketch. Freehand sketches are the most geometrically impoverished input one could offer a multi-view generator. They convey scene intent through abstract strokes while introducing spatial distortions that actively conflict with any consistent 3D interpretation. No prior method attempts this; existing multi-view approaches require photographs or text, while sketch-to-3D methods need multiple views or costly per-scene optimisation. We address three compounding challenges; absent training data, the need for geometric reasoning from distorted 2D input, and cross-view consistency, through three mutually reinforcing contributions: (i) a curated dataset of 9k sketch-to-multiview samples, constructed via an automated generation and filtering pipeline; (ii) Parallel Camera-Aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
