CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image   Encoders

Kevin Frans; L.B. Soros; Olaf Witkowski

arXiv:2106.14843·cs.CV·June 29, 2021·93 cites

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

Kevin Frans, L.B. Soros, Olaf Witkowski

PDF

Open Access 2 Repos 1 Video

TL;DR

CLIPDraw is a zero-shot text-to-drawing synthesis method that uses a pre-trained language-image encoder to generate vector stroke drawings aligned with natural language descriptions, showcasing diverse styles and complexity levels.

Contribution

This work introduces CLIPDraw, a novel approach that synthesizes drawings from text without training, leveraging a pre-trained encoder and vector strokes for simple, recognizable, and style-diverse images.

Findings

01

Produces drawings that match ambiguous text in multiple ways

02

Generates diverse artistic styles reliably

03

Scales from simple to complex images with stroke count

Abstract

This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders· slideslive

Taxonomy

TopicsHandwritten Text Recognition Techniques · Human Motion and Animation · Multimodal Machine Learning Applications