From Robustness to Improved Generalization and Calibration in Pre-trained Language Models
Josip Juki\'c, Jan \v{S}najder

TL;DR
This paper introduces JacHess, a novel regularization method for pre-trained language models that improves their generalization and calibration by minimizing Jacobian and Hessian norms, addressing challenges unique to NLP.
Contribution
The paper proposes JacHess, a two-phase regularization technique that enhances PLM performance by controlling representation smoothness through Jacobian and Hessian regularization.
Findings
JacHess outperforms unregularized fine-tuning on GLUE benchmark
JacHess improves model calibration and in-domain generalization
Regularization effectively addresses NLP-specific challenges
Abstract
Enhancing generalization and uncertainty quantification in pre-trained language models (PLMs) is crucial for their effectiveness and reliability. Building on machine learning research that established the importance of robustness for improving generalization, we investigate the role of representation smoothness, achieved via Jacobian and Hessian regularization, in enhancing PLM performance. Although such regularization methods have proven effective in computer vision, their application in natural language processing (NLP), where PLM inputs are derived from a discrete domain, poses unique challenges. We introduce a novel two-phase regularization approach, JacHess, which minimizes the norms of the Jacobian and Hessian matrices within PLM intermediate representations relative to their inputs. Our evaluation using the GLUE benchmark demonstrates that JacHess significantly improves in-domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
