On the Universality of Deep Contextual Language Models
Shaily Bhatt, Poonam Goyal, Sandipan Dandapat, Monojit Choudhury,, Sunayana Sitaram

TL;DR
This paper investigates the concept of universality in deep contextual language models, analyzing their capabilities and limitations across diverse tasks, languages, and domains to guide future research for inclusive NLP applications.
Contribution
It defines seven key dimensions of universality for language models and reviews theoretical and empirical evidence supporting their performance across these dimensions.
Findings
Models perform well across multiple tasks and languages.
Current limitations highlight areas for improvement in fairness and inclusivity.
Survey provides a foundation for future research directions.
Abstract
Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as `Universal Language Models' as the starting point across diverse tasks, domains, and languages. This work explores the notion of `Universality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · XLM-R · Linear Layer · mBERT · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection
