Distributionally Robust Language Modeling

Yonatan Oren; Shiori Sagawa; Tatsunori B. Hashimoto; Percy Liang

arXiv:1909.02060·cs.CL·September 6, 2019

Distributionally Robust Language Modeling

Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a distributionally robust training method for language models that improves performance across unknown test distributions by minimizing worst-case losses over topic mixtures, demonstrated by significant perplexity reduction.

Contribution

It proposes a novel DRO approach called topic CVaR for training language models to perform well across diverse, unseen topic distributions without prior knowledge of test data.

Findings

01

Achieved 5.5 point perplexity reduction on Yelp reviews test set.

02

Demonstrated robustness of the model across different topic distributions.

03

Improved generalization compared to standard MLE training.

Abstract

Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://worksheets.codalab.org/worksheets/0xf8122ebd24e94209a2a1764007509098
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis