Few-shot Sequence Learning with Transformers
Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio, Ranzato, Arthur Szlam

TL;DR
This paper introduces a simple, efficient Transformer-based method for few-shot sequence learning that optimizes task-specific tokens on the fly without complex architecture modifications.
Contribution
Proposes a novel few-shot learning approach using task tokens in Transformers, avoiding complex modifications and second-order derivatives.
Findings
Performs comparably to existing methods in various tasks
More computationally efficient than current approaches
Utilizes compositional task descriptors to enhance performance
Abstract
Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research
