HyPe: Better Pre-trained Language Model Fine-tuning with Hidden   Representation Perturbation

Hongyi Yuan; Zheng Yuan; Chuanqi Tan; Fei Huang; Songfang Huang

arXiv:2212.08853·cs.CL·May 12, 2023·1 cites

HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation

Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, Songfang Huang

PDF

Open Access 1 Repo

TL;DR

HyPe introduces a novel fine-tuning method for pre-trained language models that perturbs hidden representations to improve robustness, generalization, and performance on NLP tasks with minimal computational cost.

Contribution

This work proposes HyPe, a new fine-tuning technique that perturbs hidden representations of Transformer layers to enhance model robustness and performance.

Findings

01

HyPe outperforms vanilla fine-tuning on GLUE and NLI datasets.

02

HyPe improves generalization of hidden representations across layers.

03

HyPe adds negligible computational overhead and is compatible with existing methods.

Abstract

Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuanhy1997/hype
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Softmax · Graph Self-Attention · RAdam · Hyperboloid Embeddings