Training and Inference on Any-Order Autoregressive Models the Right Way
Andy Shih, Dorsa Sadigh, Stefano Ermon

TL;DR
This paper improves Any-Order Autoregressive Models (AO-ARMs) by reducing model redundancy and better weighting training loss, resulting in state-of-the-art performance in arbitrary conditional inference across multiple data domains.
Contribution
It introduces a method to eliminate redundancy in AO-ARMs and emphasizes training on more frequently used conditionals, enhancing performance without losing tractability.
Findings
Achieved state-of-the-art likelihoods on Text8, CIFAR10, ImageNet32, and tabular data.
Reduced redundancy in probabilistic modeling of AO-ARMs.
Improved performance by upweighting training on frequently evaluated conditionals.
Abstract
Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting. In recent years, the family of Any-Order Autoregressive Models (AO-ARMs) -- closely related to popular models such as BERT and XLNet -- has shown breakthrough performance in arbitrary conditional tasks across a sweeping range of domains. But, in spite of their success, in this paper we identify significant improvements to be made to previous formulations of AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their probabilistic model, i.e., they define the same distribution in multiple different ways. We alleviate this redundancy by training on a smaller set of univariate conditionals that still maintains support for efficient arbitrary conditional inference. Second, we upweight the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Explainable Artificial Intelligence (XAI)
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Weight Decay · Attention Dropout · BERT · Linear Layer · Multi-Head Attention · Adam · Linear Warmup With Linear Decay
