Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for   Improved Generalization

Sang Michael Xie; Tengyu Ma; Percy Liang

arXiv:2006.16205·cs.LG·October 26, 2023·1 cites

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Sang Michael Xie, Tengyu Ma, Percy Liang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces composed fine-tuning, a method that preserves the output structure learned from unlabeled data by freezing a pre-trained denoiser, leading to better generalization in structured prediction tasks.

Contribution

The paper proposes composed fine-tuning, which maintains output structure by freezing a pre-trained denoiser during predictor fine-tuning, improving generalization especially on out-of-distribution data.

Findings

01

Composed fine-tuning outperforms standard fine-tuning on pseudocode-to-code datasets.

02

It significantly improves generalization on out-of-distribution examples.

03

Theoretical analysis shows reduced predictor complexity with composed fine-tuning.

Abstract

We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the output structure by pre-training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser, which is frozen to preserve output structure. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization· slideslive

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding