Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

Jihyeon Lee; Dain Kim; Doohae Jung; Boseop Kim; Kyoung-Woon On

arXiv:2307.14856·cs.CL·August 28, 2024

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that with proper prompt design and configuration, encoder-decoder (seq2seq) models can be highly effective few-shot learners across diverse tasks, outperforming larger decoder-only models.

Contribution

It provides the first extensive comparison of in-context few-shot learning in seq2seq versus decoder-only models and introduces methods to enhance seq2seq few-shot learning capabilities.

Findings

01

Seq2seq models can perform well in few-shot learning with proper prompting.

02

Proposed methods outperform larger decoder-only models in various tasks.

03

Seq2seq models show broad applicability beyond traditional tasks.

Abstract

In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

+ This work develops an in-context evaluation toolkit for seq2seq models and conduct extensive experiments to investigate the performance of seq2seq models in zero-shot to few-shot scenarios. + The author explore prompting strategies and fusion-based approaches in encoder-decoder models, which reveals their ability of zero/few-shot learning. + The comprehensive experiments of comparison between decoder-only and encoder-decoder models could be very useful for researchers in this field.

Weaknesses

- The technical novelty of this work is a bit weak. The proposed objective-aligned prompting and fusion-based approach are straightforward. - The detailed description of the objective-aligned prompting method is missing.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. This paper is well organized and easy to follow. 2. The motivation is reasonable and experiments are abundant. 3. The findings and conclusions about in-context few-shot learning capabilities of seq2seq models will be interesting to the community.

Weaknesses

Several main concerns are as follows: 1. This paper claims that the objective-aligned prompting strategy is its one key contribution. However, this strategy seems to be very straightforward and some recent state-of-the-art works have already introduced such a strategy. In this sense, this contribution is somewhat limited. 2. The second contribution of this work is a fusion-based approach, which also comes from the existing works, such as RAG and Fid. Therefore, what’s the main difference and c

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The primary strength of this works seems to come from experimentally demonstrating that the seq2seq model can outperform the decoder-only model with 6 times larger parameters across diverse datasets.

Weaknesses

Would've liked to see some evaluations around more varied generative tasks like Math/Coding which are more practically useful.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Pneumonia and Respiratory Infections · Machine Learning and Algorithms

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence · ALIGN