LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5
Chengwei Qin, Shafiq Joty

TL;DR
LFPT5 is a unified prompt-tuning framework for lifelong few-shot language learning that generates pseudo-samples to prevent forgetting and adapt to new tasks, outperforming previous methods across various settings.
Contribution
Proposes LFPT5, a novel prompt-tuning based framework that enables lifelong few-shot learning by generating pseudo-samples and tuning prompts for new tasks.
Findings
Significantly outperforms previous methods in LFLL tasks.
Effectively mitigates forgetting of previous knowledge.
Adapts to various task types with improved performance.
Abstract
Existing approaches to lifelong language learning rely on plenty of labeled data for learning a new task, which is hard to obtain in most real scenarios. Considering that humans can continually learn new tasks from a handful of examples, we expect the models also to be able to generalize well on new few-shot tasks without forgetting the previous ones. In this work, we define this more challenging yet practical problem as Lifelong Few-shot Language Learning (LFLL) and propose a unified framework for it based on prompt tuning of T5. Our framework called LFPT5 takes full advantage of PT's strong few-shot learning ability, and simultaneously trains the model as a task solver and a data generator. Before learning a new domain of the same task type, LFPT5 generates pseudo (labeled) samples of previously learned domains, and later gets trained on those samples to alleviate forgetting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Adafactor · Byte Pair Encoding · Inverse Square Root Schedule · Dropout · Layer Normalization
