Mode recovery in neural autoregressive sequence modeling
Ilia Kulikov, Sean Welleck, Kyunghyun Cho

TL;DR
This paper investigates how modes of data distributions are preserved or lost in neural autoregressive sequence models, revealing that mode recovery is heavily influenced by the entire learning process and distribution structure.
Contribution
The study introduces a new mode recovery cost metric and a tractable testbed to analyze mode preservation across the learning chain in sequence modeling.
Findings
Mode recovery cost varies significantly with distribution type.
Learning can both improve or worsen mode recovery, especially with semi-structured data.
Decoding performance is highly affected by earlier learning choices.
Abstract
Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an unreasonably high affinity to short sequences after training and to infinitely long sequences at decoding time. We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintained throughout the full learning chain of the ground-truth, empirical, learned and decoding-induced distributions, via the newly proposed mode recovery cost. We design a tractable testbed where we build three types of ground-truth distributions: (1) an LSTM based structured distribution, (2) an unstructured distribution where probability of a sequence does not depend on its content, and (3) a product of these two which we call a semi-structured distribution. Our study reveals both expected and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
