Posterior Differential Regularization with f-divergence for Improving   Model Robustness

Hao Cheng; Xiaodong Liu; Lis Pereira; Yaoliang Yu; Jianfeng Gao

arXiv:2010.12638·cs.CL·April 13, 2021·1 cites

Posterior Differential Regularization with f-divergence for Improving Model Robustness

Hao Cheng, Xiaodong Liu, Lis Pereira, Yaoliang Yu, Jianfeng Gao

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel regularization framework based on f-divergences to improve NLP model robustness, connecting existing methods and demonstrating enhanced generalization in various scenarios.

Contribution

It generalizes posterior differential regularization using f-divergences and empirically shows improved robustness and generalization for BERT models across tasks.

Findings

01

Regularizing with f-divergence improves model robustness.

02

BERT-base with proper regularization matches BERT-large performance.

03

Enhanced in-domain and out-of-domain generalization observed.

Abstract

We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework. Additionally, we generalize the posterior differential regularization to the family of $f$ -divergences and characterize the overall regularization framework in terms of Jacobian matrix. Empirically, we systematically compare those regularizations and standard BERT training on a diverse set of tasks to provide a comprehensive profile of their effect on model in-domain and out-of-domain generalization. For both fully supervised and semi-supervised settings, our experiments show that regularizing the posterior differential with $f$ -divergence can result in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsLinear Layer · Layer Normalization · Softmax · Adam · Dense Connections · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay