HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation
Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, Songfang Huang

TL;DR
HyPe introduces a novel fine-tuning method for pre-trained language models that perturbs hidden representations to improve robustness, generalization, and performance on NLP tasks with minimal computational cost.
Contribution
This work proposes HyPe, a new fine-tuning technique that perturbs hidden representations of Transformer layers to enhance model robustness and performance.
Findings
HyPe outperforms vanilla fine-tuning on GLUE and NLI datasets.
HyPe improves generalization of hidden representations across layers.
HyPe adds negligible computational overhead and is compatible with existing methods.
Abstract
Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Softmax · Graph Self-Attention · RAdam · Hyperboloid Embeddings
