Gradient Ascent Post-training Enhances Language Model Generalization
Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo

TL;DR
This paper demonstrates that a simple post-training step called Gradient Ascent Post-training (GAP) can significantly improve the zero-shot generalization of pretrained language models across various NLP tasks, especially with out-of-distribution data.
Contribution
The paper introduces GAP as an effective, task-agnostic post-training method that enhances language model generalization without additional labeled data.
Findings
GAP improves zero-shot performance across 12 NLP tasks.
Applying GAP on out-of-distribution data yields the best improvements.
GAP makes smaller models comparable to larger models in performance.
Abstract
In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
