DoodleFormer: Creative Sketch Drawing with Transformers

Ankan Kumar Bhunia; Salman Khan; Hisham Cholakkal; Rao Muhammad Anwer,; Fahad Shahbaz Khan; Jorma Laaksonen; Michael Felsberg

arXiv:2112.03258·cs.CV·September 16, 2022

DoodleFormer: Creative Sketch Drawing with Transformers

Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer,, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg

PDF

TL;DR

DoodleFormer is a novel two-stage transformer-based framework that generates diverse, realistic creative sketches by modeling global and local relations and incorporating fine details, outperforming existing methods on multiple datasets.

Contribution

It introduces a coarse-to-fine approach with graph-aware transformers and probabilistic decoding for diverse creative sketch generation.

Findings

01

Outperforms state-of-the-art on Creative Birds and Creative Creatures datasets.

02

Achieves a 25-point improvement in FID on Creative Creatures.

03

Effective for text-to-sketch generation and sketch completion tasks.

Abstract

Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.