Cyclical Annealing Schedule: A Simple Approach to Mitigating KL   Vanishing

Hao Fu; Chunyuan Li; Xiaodong Liu; Jianfeng Gao; Asli Celikyilmaz,; Lawrence Carin

arXiv:1903.10145·cs.LG·June 12, 2019·169 cites

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz,, Lawrence Carin

PDF

Open Access 2 Repos

TL;DR

This paper introduces a cyclical annealing schedule for b2 in VAEs, which mitigates KL vanishing by progressively learning meaningful latent codes through multiple cycles, improving NLP task performance.

Contribution

The paper proposes a simple cyclical annealing schedule for b2 that enhances latent code learning in VAEs, addressing KL vanishing issues in NLP applications.

Findings

01

Improved language modeling performance.

02

Enhanced dialog response generation.

03

More effective unsupervised language pre-training.

Abstract

Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter \beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for \beta, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing \beta multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms

MethodsUSD Coin Customer Service Number +1-833-534-1729