Simple is Better and Large is Not Enough: Towards Ensembling of   Foundational Language Models

Nancy Tyagi; Aidin Shiri; Surjodeep Sarkar; Abhishek Kumar Umrawal,; Manas Gaur

arXiv:2308.12272·cs.CL·August 24, 2023·1 cites

Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

Nancy Tyagi, Aidin Shiri, Surjodeep Sarkar, Abhishek Kumar Umrawal,, Manas Gaur

PDF

Open Access

TL;DR

This paper demonstrates that ensembling smaller foundational language models like BERT with novel techniques can outperform larger models, reducing hallucination and uncertainty in NLP tasks, especially in sensitive applications.

Contribution

It introduces a new ensemble approach, including a knowledge-guided deep ensemble, showing smaller models can surpass larger ones in performance.

Findings

01

Deep-Ensemble BERT outperforms BERTlarge on benchmark datasets.

02

Ensembling enhances the coordination among FLMs, improving NLP task accuracy.

03

Smaller models with ensembling reduce hallucination and uncertainty.

Abstract

Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are developing larger FLMs (e.g., XLNet, T5) to enable contextualized language representation, classification, and generation. While developing larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty. Fundamentally, larger FLMs are built on the same foundations as smaller FLMs (e.g., BERT); hence, one must recognize the potential of smaller FLMs which can be realized through an ensemble. In the current research, we perform a reality check on FLMs and their ensemble on benchmark and real-world datasets. We hypothesize that the ensembling of FLMs can influence the individualistic attention of FLMs and unravel the strength of coordination and cooperation of different FLMs. We utilize BERT and define three other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Byte Pair Encoding · Adam · Linear Layer · Layer Normalization · Dense Connections · Weight Decay · Residual Connection