Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation

Riccardo Brioschi; Aleksandr Alekseev; Emanuele Nevali; Berkay D\"oner; Omar El Malki; Blagoj Mitrevski; Leandro Kieliger; Mark Collier; Andrii Maksai; Jesse Berent; Claudiu Musat; Efi Kokiopoulou

arXiv:2510.27632·cs.CV·November 3, 2025

Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation

Riccardo Brioschi, Aleksandr Alekseev, Emanuele Nevali, Berkay D\"oner, Omar El Malki, Blagoj Mitrevski, Leandro Kieliger, Mark Collier, Andrii Maksai, Jesse Berent, Claudiu Musat, Efi Kokiopoulou

PDF

Open Access

TL;DR

This paper introduces a multimodal transformer approach for sketch-guided layout generation, demonstrating that using user sketches as constraints improves layout quality and usability, with a scalable synthetic data generation method and extensive evaluation.

Contribution

It presents a novel sketch-to-layout framework using multimodal transformers and synthetic data, advancing intuitive design guidance in graphic layout generation.

Findings

01

Outperforms state-of-the-art constraint-based methods

02

Effective use of synthetic sketches for training

03

Provides large-scale synthetic sketch datasets for research

Abstract

Graphic layout generation is a growing research area focusing on generating aesthetically pleasing layouts ranging from poster designs to documents. While recent research has explored ways to incorporate user constraints to guide the layout generation, these constraints often require complex specifications which reduce usability. We introduce an innovative approach exploiting user-provided sketches as intuitive constraints and we demonstrate empirically the effectiveness of this new guidance method, establishing the sketch-to-layout problem as a promising research direction, which is currently under-explored. To tackle the sketch-to-layout problem, we propose a multimodal transformer-based solution using the sketch and the content assets as inputs to produce high quality layouts. Since collecting sketch training data from human annotators to train our model is very costly, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInteractive and Immersive Displays · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis