Finetuning Pretrained Transformers into Variational Autoencoders
Seongmin Park, Jihwa Lee

TL;DR
This paper introduces a simple two-phase finetuning method to convert pretrained Transformers into variational autoencoders, addressing posterior collapse without extensive pretraining, and evaluates its effectiveness.
Contribution
It proposes a novel, resource-efficient finetuning scheme to transform Transformers into VAEs, enabling broader access and application.
Findings
Competitive performance with large-scale pretrained VAEs on some metrics
Effective mitigation of posterior collapse through the proposed method
Comprehensive analysis of existing collapse alleviation techniques
Abstract
Text variational autoencoders (VAEs) are notorious for posterior collapse, a phenomenon where the model's decoder learns to ignore signals from the encoder. Because posterior collapse is known to be exacerbated by expressive decoders, Transformers have seen limited adoption as components of text VAEs. Existing studies that incorporate Transformers into text VAEs (Li et al., 2020; Fang et al., 2021) mitigate posterior collapse using massive pretraining, a technique unavailable to most of the research community without extensive computing resources. We present a simple two-phase training scheme to convert a sequence-to-sequence Transformer into a VAE with just finetuning. The resulting language model is competitive with massively pretrained Transformer-based VAEs in some internal metrics while falling short on others. To facilitate training we comprehensively explore the impact of common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Adam · Label Smoothing · Residual Connection · Byte Pair Encoding
