Finetuning Pretrained Transformers into Variational Autoencoders

Seongmin Park; Jihwa Lee

arXiv:2108.02446·cs.CL·November 25, 2021

Finetuning Pretrained Transformers into Variational Autoencoders

Seongmin Park, Jihwa Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple two-phase finetuning method to convert pretrained Transformers into variational autoencoders, addressing posterior collapse without extensive pretraining, and evaluates its effectiveness.

Contribution

It proposes a novel, resource-efficient finetuning scheme to transform Transformers into VAEs, enabling broader access and application.

Findings

01

Competitive performance with large-scale pretrained VAEs on some metrics

02

Effective mitigation of posterior collapse through the proposed method

03

Comprehensive analysis of existing collapse alleviation techniques

Abstract

Text variational autoencoders (VAEs) are notorious for posterior collapse, a phenomenon where the model's decoder learns to ignore signals from the encoder. Because posterior collapse is known to be exacerbated by expressive decoders, Transformers have seen limited adoption as components of text VAEs. Existing studies that incorporate Transformers into text VAEs (Li et al., 2020; Fang et al., 2021) mitigate posterior collapse using massive pretraining, a technique unavailable to most of the research community without extensive computing resources. We present a simple two-phase training scheme to convert a sequence-to-sequence Transformer into a VAE with just finetuning. The resulting language model is competitive with massively pretrained Transformer-based VAEs in some internal metrics while falling short on others. To facilitate training we comprehensively explore the impact of common…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seongminp/transformers-into-vaes
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Adam · Label Smoothing · Residual Connection · Byte Pair Encoding