The Effect of Model Size on Worst-Group Generalization
Alan Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang,, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt

TL;DR
This study systematically examines how increasing model size affects worst-group generalization in ERM, finding that larger pre-trained models improve performance even without subgroup labels across vision and NLP tasks.
Contribution
It provides the first comprehensive analysis of model size effects on worst-group generalization across multiple architectures, domains, and initialization methods.
Findings
Larger models do not harm and may improve worst-group test accuracy.
Pre-trained larger models consistently outperform smaller ones on Waterbirds and MultiNLI.
Increasing model size is recommended when subgroup labels are unavailable.
Abstract
Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Machine Learning and Data Classification
MethodsMax Pooling · Softmax · Dense Connections · Dropout · Convolution
