Posterior Differential Regularization with f-divergence for Improving Model Robustness
Hao Cheng, Xiaodong Liu, Lis Pereira, Yaoliang Yu, Jianfeng Gao

TL;DR
This paper introduces a novel regularization framework based on f-divergences to improve NLP model robustness, connecting existing methods and demonstrating enhanced generalization in various scenarios.
Contribution
It generalizes posterior differential regularization using f-divergences and empirically shows improved robustness and generalization for BERT models across tasks.
Findings
Regularizing with f-divergence improves model robustness.
BERT-base with proper regularization matches BERT-large performance.
Enhanced in-domain and out-of-domain generalization observed.
Abstract
We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework. Additionally, we generalize the posterior differential regularization to the family of -divergences and characterize the overall regularization framework in terms of Jacobian matrix. Empirically, we systematically compare those regularizations and standard BERT training on a diverse set of tasks to provide a comprehensive profile of their effect on model in-domain and out-of-domain generalization. For both fully supervised and semi-supervised settings, our experiments show that regularizing the posterior differential with -divergence can result in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsLinear Layer · Layer Normalization · Softmax · Adam · Dense Connections · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay
