Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners
Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On

TL;DR
This paper demonstrates that with proper prompt design and configuration, encoder-decoder (seq2seq) models can be highly effective few-shot learners across diverse tasks, outperforming larger decoder-only models.
Contribution
It provides the first extensive comparison of in-context few-shot learning in seq2seq versus decoder-only models and introduces methods to enhance seq2seq few-shot learning capabilities.
Findings
Seq2seq models can perform well in few-shot learning with proper prompting.
Proposed methods outperform larger decoder-only models in various tasks.
Seq2seq models show broad applicability beyond traditional tasks.
Abstract
In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger…
Peer Reviews
Decision·Submitted to ICLR 2024
+ This work develops an in-context evaluation toolkit for seq2seq models and conduct extensive experiments to investigate the performance of seq2seq models in zero-shot to few-shot scenarios. + The author explore prompting strategies and fusion-based approaches in encoder-decoder models, which reveals their ability of zero/few-shot learning. + The comprehensive experiments of comparison between decoder-only and encoder-decoder models could be very useful for researchers in this field.
- The technical novelty of this work is a bit weak. The proposed objective-aligned prompting and fusion-based approach are straightforward. - The detailed description of the objective-aligned prompting method is missing.
1. This paper is well organized and easy to follow. 2. The motivation is reasonable and experiments are abundant. 3. The findings and conclusions about in-context few-shot learning capabilities of seq2seq models will be interesting to the community.
Several main concerns are as follows: 1. This paper claims that the objective-aligned prompting strategy is its one key contribution. However, this strategy seems to be very straightforward and some recent state-of-the-art works have already introduced such a strategy. In this sense, this contribution is somewhat limited. 2. The second contribution of this work is a fusion-based approach, which also comes from the existing works, such as RAG and Fid. Therefore, what’s the main difference and c
The primary strength of this works seems to come from experimentally demonstrating that the seq2seq model can outperform the decoder-only model with 6 times larger parameters across diverse datasets.
Would've liked to see some evaluations around more varied generative tasks like Math/Coding which are more practically useful.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Pneumonia and Respiratory Infections · Machine Learning and Algorithms
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence · ALIGN
