Impossible Triangle: What's Next for Pre-trained Language Models?
Chenguang Zhu, Michael Zeng

TL;DR
This paper discusses the limitations of current pre-trained language models in balancing size, few-shot learning, and fine-tuning, and explores future directions to overcome these challenges.
Contribution
It introduces the concept of the Impossible Triangle for PLMs and analyzes existing methods and future research directions to address these core challenges.
Findings
Current PLMs lack at least one property of the Impossible Triangle.
Techniques like knowledge distillation and prompt learning help but add complexity.
Future research should focus on integrating these properties effectively.
Abstract
Recent development of large-scale pre-trained language models (PLM) have significantly improved the capability of models in various NLP tasks, in terms of performance after task-specific fine-tuning and zero-shot / few-shot learning. However, many of such models come with a dauntingly huge size that few institutions can afford to pre-train, fine-tune or even deploy, while moderate-sized models usually lack strong generalized few-shot learning capabilities. In this paper, we first elaborate the current obstacles of using PLM models in terms of the Impossible Triangle: 1) moderate model size, 2) state-of-the-art few-shot learning capability, and 3) state-of-the-art fine-tuning capability. We argue that all existing PLM models lack one or more properties from the Impossible Triangle. To remedy these missing properties of PLMs, various techniques have been proposed, such as knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
