From Robustness to Improved Generalization and Calibration in   Pre-trained Language Models

Josip Juki\'c; Jan \v{S}najder

arXiv:2404.00758·cs.CL·April 2, 2024·1 cites

From Robustness to Improved Generalization and Calibration in Pre-trained Language Models

Josip Juki\'c, Jan \v{S}najder

PDF

Open Access 1 Video

TL;DR

This paper introduces JacHess, a novel regularization method for pre-trained language models that improves their generalization and calibration by minimizing Jacobian and Hessian norms, addressing challenges unique to NLP.

Contribution

The paper proposes JacHess, a two-phase regularization technique that enhances PLM performance by controlling representation smoothness through Jacobian and Hessian regularization.

Findings

01

JacHess outperforms unregularized fine-tuning on GLUE benchmark

02

JacHess improves model calibration and in-domain generalization

03

Regularization effectively addresses NLP-specific challenges

Abstract

Enhancing generalization and uncertainty quantification in pre-trained language models (PLMs) is crucial for their effectiveness and reliability. Building on machine learning research that established the importance of robustness for improving generalization, we investigate the role of representation smoothness, achieved via Jacobian and Hessian regularization, in enhancing PLM performance. Although such regularization methods have proven effective in computer vision, their application in natural language processing (NLP), where PLM inputs are derived from a discrete domain, poses unique challenges. We introduce a novel two-phase regularization approach, JacHess, which minimizes the norms of the Jacobian and Hessian matrices within PLM intermediate representations relative to their inputs. Our evaluation using the GLUE benchmark demonstrates that JacHess significantly improves in-domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Robustness to Improved Generalization and Calibration in Pre-trained Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling