On Training Sample Memorization: Lessons from Benchmarking Generative   Modeling with a Large-scale Competition

Ching-Yuan Bai; Hsuan-Tien Lin; Colin Raffel; and Wendy Chih-wen Kan

arXiv:2106.03062·cs.LG·June 8, 2021

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

Ching-Yuan Bai, Hsuan-Tien Lin, Colin Raffel, and Wendy Chih-wen Kan

PDF

1 Repo

TL;DR

This paper critically evaluates the reliability of common metrics for generative models by analyzing a large-scale competition, revealing widespread unintentional memorization and proposing a new metric, MiFID, to better assess genuine generative quality.

Contribution

It introduces MiFID, a memorization-aware metric, and provides a comprehensive analysis of memorization issues in generative models through a large competition and manual inspection.

Findings

01

Unintentional memorization is prevalent in popular generative models.

02

MiFID effectively detects memorization in generative outputs.

03

Analysis of top models reveals common forms of memorization.

Abstract

Many recent developments on generative models for natural images have relied on heuristically-motivated metrics that can be easily gamed by memorizing a small sample from the true distribution or training a model directly to improve the metric. In this work, we critically evaluate the gameability of these metrics by designing and deploying a generative modeling competition. Our competition received over 11000 submitted models. The competitiveness between participants allowed us to investigate both intentional and unintentional memorization in generative modeling. To detect intentional memorization, we propose the ``Memorization-Informed Fr\'echet Inception Distance'' (MiFID) as a new memorization-aware metric and design benchmark procedures to ensure that winning submissions made genuine improvements in perceptual quality. Furthermore, we manually inspect the code for the 1000…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jybai/generative-memorization-benchmark
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.