Mode recovery in neural autoregressive sequence modeling

Ilia Kulikov; Sean Welleck; Kyunghyun Cho

arXiv:2106.05459·cs.LG·June 11, 2021

Mode recovery in neural autoregressive sequence modeling

Ilia Kulikov, Sean Welleck, Kyunghyun Cho

PDF

Open Access 1 Repo

TL;DR

This paper investigates how modes of data distributions are preserved or lost in neural autoregressive sequence models, revealing that mode recovery is heavily influenced by the entire learning process and distribution structure.

Contribution

The study introduces a new mode recovery cost metric and a tractable testbed to analyze mode preservation across the learning chain in sequence modeling.

Findings

01

Mode recovery cost varies significantly with distribution type.

02

Learning can both improve or worsen mode recovery, especially with semi-structured data.

03

Decoding performance is highly affected by earlier learning choices.

Abstract

Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an unreasonably high affinity to short sequences after training and to infinitely long sequences at decoding time. We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintained throughout the full learning chain of the ground-truth, empirical, learned and decoding-induced distributions, via the newly proposed mode recovery cost. We design a tractable testbed where we build three types of ground-truth distributions: (1) an LSTM based structured distribution, (2) an unstructured distribution where probability of a sequence does not depend on its content, and (3) a product of these two which we call a semi-structured distribution. Our study reveals both expected and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uralik/mode_recovery
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory