How (not) to Train your Generative Model: Scheduled Sampling,   Likelihood, Adversary?

Ferenc Husz\'ar

arXiv:1511.05101·stat.ML·November 17, 2015·205 cites

How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

Ferenc Husz\'ar

PDF

Open Access 1 Repo

TL;DR

This paper critiques scheduled sampling for training generative models, argues for an alternative training objective better suited for natural sample generation, and introduces a generalized adversarial training method that explains higher perceived sample quality.

Contribution

It provides a theoretical critique of scheduled sampling, proposes an ideal training objective for generative models, and introduces a generalized adversarial training framework.

Findings

01

Scheduled sampling's objective is improper and leads to inconsistent learning.

02

Maximum likelihood is unsuitable for generating natural-looking samples.

03

A generalized adversarial training method explains higher perceived sample quality.

Abstract

Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to the winning entry to the MSCOCO image captioning benchmark in 2015. Here we show that despite this impressive empirical performance, the objective function underlying scheduled sampling is improper and leads to an inconsistent learning algorithm. Secondly, we revisit the problems that scheduled sampling was meant to address, and present an alternative interpretation. We argue that maximum likelihood is an inappropriate training objective when the end-goal is to generate natural-looking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bloomberg/mixce-acl2023
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare