MAUVE Scores for Generative Models: Theory and Practice

Krishna Pillutla; Lang Liu; John Thickstun; Sean Welleck; Swabha; Swayamdipta; Rowan Zellers; Sewoong Oh; Yejin Choi; Zaid Harchaoui

arXiv:2212.14578·cs.LG·December 8, 2023·5 cites

MAUVE Scores for Generative Models: Theory and Practice

Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha, Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui

PDF

Open Access 1 Repo

TL;DR

MAUVE is a new set of statistical scores designed to compare generative models' output distributions with real data, effectively measuring quality in text and images and correlating well with human judgments.

Contribution

This paper introduces MAUVE scores, a novel family of divergence-based metrics for evaluating generative models across text and image domains, with theoretical bounds and practical estimation methods.

Findings

01

MAUVE scores correlate with human judgments of text quality.

02

MAUVE effectively identifies properties of generated images.

03

The proposed methods outperform existing metrics in certain scenarios.

Abstract

Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$ -divergences and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

krishnap25/mauve
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computational and Text Analysis Methods · Multimodal Machine Learning Applications