Discovering Non-monotonic Autoregressive Orderings with Variational   Inference

Xuanlin Li; Brandon Trabucco; Dong Huk Park; Michael Luo; Sheng Shen,; Trevor Darrell; Yang Gao

arXiv:2110.15797·cs.CL·November 1, 2021·1 cites

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen,, Trevor Darrell, Yang Gao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an unsupervised, parallelizable method for discovering high-quality autoregressive orderings in language modeling using variational inference and policy gradients, outperforming fixed order baselines.

Contribution

It proposes a novel variational inference framework with a Transformer encoder for learning permutation-based orderings without supervision.

Findings

01

Discovered orderings are competitive with or better than fixed orders.

02

Method is context-aware and scalable due to parallelizable design.

03

Achieved effective learning of sequence orderings from data alone.

Abstract

The predominant approach for language modeling is to process sequences from left to right, but this eliminates a source of information: the order by which the sequence was generated. One strategy to recover this information is to decode both the content and ordering of tokens. Existing approaches supervise content and ordering by designing problem-specific loss functions and pre-training with an ordering pre-selected. Other recent works use iterative search to discover problem-specific orderings for training, but suffer from high time complexity and cannot be efficiently parallelized. We address these limitations with an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data -- no domain knowledge required. The learner contains an encoder network and decoder language model that perform variational inference with autoregressive orders…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuanlinli17/autoregressive_inference
tfOfficial

Videos

Discovering Non-monotonic Autoregressive Orderings with Variational Inference· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Variational Inference · Dropout · Label Smoothing · Layer Normalization · Dense Connections · Residual Connection · Adam · Multi-Head Attention