Latent Representation Matters: Human-like Sketches in One-shot Drawing   Tasks

Victor Boutin; Rishav Mukherji; Aditya Agrawal; Sabine Muzellec,; Thomas Fel; Thomas Serre; Rufin VanRullen

arXiv:2406.06079·cs.CV·November 6, 2024

Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks

Victor Boutin, Rishav Mukherji, Aditya Agrawal, Sabine Muzellec,, Thomas Fel, Thomas Serre, Rufin VanRullen

PDF

Open Access

TL;DR

This paper investigates how different inductive biases in Latent Diffusion Models influence their ability to generate human-like sketches in one-shot drawing tasks, highlighting the importance of regularizations like redundancy reduction and prototypes.

Contribution

It systematically studies the impact of various inductive biases on LDMs' latent space, demonstrating that specific regularizations produce near-human-like sketches in one-shot tasks.

Findings

01

Redundancy reduction and prototype regularizations improve sketch recognizability.

02

LDMs with these regularizations produce sketches comparable to human perception.

03

The gap between human and machine one-shot drawing performance is nearly closed.

Abstract

Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Bitcoin Customer Service Number +1-833-534-1729 · Convolution · Random Gaussian Blur · Average Pooling · Global Average Pooling · Max Pooling · Kaiming Initialization · Color Jitter · Normalized Temperature-scaled Cross Entropy Loss