Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

Ahmed Bourouis; Savas Ozkan; Andrea Maracani; Yi-Zhe Song; Mete Ozay

arXiv:2604.14302·cs.CV·April 17, 2026

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

Ahmed Bourouis, Savas Ozkan, Andrea Maracani, Yi-Zhe Song, Mete Ozay

PDF

TL;DR

This paper introduces a novel method for generating multi-view 3D scenes from single freehand sketches, overcoming challenges of geometric distortion and lack of training data.

Contribution

It presents a new dataset, a transformer-based architecture with geometric inductive biases, and a supervision loss derived from structure-from-motion, enabling single-pass multi-view scene synthesis from sketches.

Findings

01

Outperforms state-of-the-art baselines in realism and geometric consistency.

02

Achieves over 60% improvement in FID score.

03

Provides up to 3.7× faster inference speed.

Abstract

We tackle a new problem: generating geometrically consistent multi-view scenes from a single freehand sketch. Freehand sketches are the most geometrically impoverished input one could offer a multi-view generator. They convey scene intent through abstract strokes while introducing spatial distortions that actively conflict with any consistent 3D interpretation. No prior method attempts this; existing multi-view approaches require photographs or text, while sketch-to-3D methods need multiple views or costly per-scene optimisation. We address three compounding challenges; absent training data, the need for geometric reasoning from distorted 2D input, and cross-view consistency, through three mutually reinforcing contributions: (i) a curated dataset of $\sim$ 9k sketch-to-multiview samples, constructed via an automated generation and filtering pipeline; (ii) Parallel Camera-Aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.