Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One
Yezhen Wang, Tong Che, Bo Li, Kaitao Song, Hengzhi Pei, Yoshua Bengio,, Dongsheng Li

TL;DR
This paper introduces E-ARM, a novel training method that transforms autoregressive models into energy-based models, improving their ability to handle sequential data by reducing exposure bias and enhancing coherence.
Contribution
The paper presents a unique energy-based training approach for autoregressive models that requires no extra parameters and improves their distribution modeling capabilities.
Findings
E-ARM effectively alleviates exposure bias.
It increases temporal coherence in generated sequences.
Empirical results show improved performance on language, translation, and image tasks.
Abstract
Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing
MethodsSoftmax
