Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models
Nancy Tyagi, Aidin Shiri, Surjodeep Sarkar, Abhishek Kumar Umrawal,, Manas Gaur

TL;DR
This paper demonstrates that ensembling smaller foundational language models like BERT with novel techniques can outperform larger models, reducing hallucination and uncertainty in NLP tasks, especially in sensitive applications.
Contribution
It introduces a new ensemble approach, including a knowledge-guided deep ensemble, showing smaller models can surpass larger ones in performance.
Findings
Deep-Ensemble BERT outperforms BERTlarge on benchmark datasets.
Ensembling enhances the coordination among FLMs, improving NLP task accuracy.
Smaller models with ensembling reduce hallucination and uncertainty.
Abstract
Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are developing larger FLMs (e.g., XLNet, T5) to enable contextualized language representation, classification, and generation. While developing larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty. Fundamentally, larger FLMs are built on the same foundations as smaller FLMs (e.g., BERT); hence, one must recognize the potential of smaller FLMs which can be realized through an ensemble. In the current research, we perform a reality check on FLMs and their ensemble on benchmark and real-world datasets. We hypothesize that the ensembling of FLMs can influence the individualistic attention of FLMs and unravel the strength of coordination and cooperation of different FLMs. We utilize BERT and define three other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Byte Pair Encoding · Adam · Linear Layer · Layer Normalization · Dense Connections · Weight Decay · Residual Connection
