Gradient Ascent Post-training Enhances Language Model Generalization

Dongkeun Yoon; Joel Jang; Sungdong Kim; Minjoon Seo

arXiv:2306.07052·cs.CL·June 13, 2023·1 cites

Gradient Ascent Post-training Enhances Language Model Generalization

Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that a simple post-training step called Gradient Ascent Post-training (GAP) can significantly improve the zero-shot generalization of pretrained language models across various NLP tasks, especially with out-of-distribution data.

Contribution

The paper introduces GAP as an effective, task-agnostic post-training method that enhances language model generalization without additional labeled data.

Findings

01

GAP improves zero-shot performance across 12 NLP tasks.

02

Applying GAP on out-of-distribution data yields the best improvements.

03

GAP makes smaller models comparable to larger models in performance.

Abstract

In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaist-lklab/gap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis