Analyzing Bagging Methods for Language Models
Pranab Islam, Shaan Khosla, Arthur Lok, Mudit Saxena

TL;DR
This paper analyzes the effectiveness of bagging methods for language models, finding that ensembling often matches but does not significantly outperform single models of similar size, with some benefits in variance reduction.
Contribution
It provides a systematic comparison of bagged ensembles versus single language models, highlighting their comparable performance and specific advantages like variance reduction.
Findings
Bagging methods are roughly equivalent to single models in performance.
Ensembling offers benefits in variance reduction.
Minor performance improvements observed in certain scenarios.
Abstract
Modern language models leverage increasingly large numbers of parameters to achieve performance on natural language understanding tasks. Ensembling these models in specific configurations for downstream tasks show even further performance improvements. In this paper, we perform an analysis of bagging language models and compare single language models to bagged ensembles that are roughly equivalent in terms of final model size. We explore an array of model bagging configurations for natural language understanding tasks with final ensemble sizes ranging from 300M parameters to 1.5B parameters and determine that our ensembling methods are at best roughly equivalent to single LM baselines. We note other positive effects of bagging and pruning in specific scenarios according to findings in our experiments such as variance reduction and minor performance improvements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsPruning
