Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks
Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji

TL;DR
Zemi introduces a zero-shot semi-parametric language model that combines retrieval-augmented training with a novel fusion module, achieving strong performance on unseen tasks with significantly smaller size than large models.
Contribution
This work presents the first semi-parametric language model demonstrating strong zero-shot performance across diverse tasks, using a novel multitask training and augmentation fusion approach.
Findings
Zemi outperforms T0-3B by 16% on seven tasks.
Zemi is 3.9 times smaller than T0-3B.
Effective zero-shot generalization achieved with semi-parametric design.
Abstract
Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce , a zero-shot semi-parametric language model. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
