Global Autoregressive Models for Data-Efficient Sequence Learning

Tetiana Parshakova; Jean-Marc Andreoli; Marc Dymetman

arXiv:1909.07063·cs.LG·September 23, 2019

Global Autoregressive Models for Data-Efficient Sequence Learning

Tetiana Parshakova, Jean-Marc Andreoli, Marc Dymetman

PDF

1 Repo

TL;DR

This paper introduces Global Autoregressive Models (GAMs), which combine autoregressive and log-linear components to improve data efficiency in sequence learning, especially under small-data conditions.

Contribution

The paper proposes GAMs that integrate global features with autoregressive models and a two-step training process involving unnormalized models and distillation for normalized inference.

Findings

01

Significant perplexity reduction with GAMs in language modeling

02

Effective use of global features to compensate for limited data

03

Two-step training improves inference speed and accuracy

Abstract

Standard autoregressive seq2seq models are easily trained by max-likelihood, but tend to show poor results under small-data conditions. We introduce a class of seq2seq models, GAMs (Global Autoregressive Models), which combine an autoregressive component with a log-linear component, allowing the use of global \textit{a priori} features to compensate for lack of data. We train these models in two steps. In the first step, we obtain an \emph{unnormalized} GAM that maximizes the likelihood of the data, but is improper for fast inference or evaluation. In the second step, we use this GAM to train (by distillation) a second autoregressive model that approximates the \emph{normalized} distribution associated with the GAM, and can be used for fast inference and evaluation. Our experiments focus on language modelling under synthetic conditions and show a strong perplexity reduction of using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

parshakova/GAMS-for-Data-Efficient-Learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGeneralized additive models · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence