Unsupervised Meta-Learning via In-Context Learning

Anna Vettoruzzo; Lorenzo Braccaioli; Joaquin Vanschoren; Marlena; Nowaczyk

arXiv:2405.16124·cs.LG·February 11, 2025

Unsupervised Meta-Learning via In-Context Learning

Anna Vettoruzzo, Lorenzo Braccaioli, Joaquin Vanschoren, Marlena, Nowaczyk

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a novel unsupervised meta-learning method that uses in-context learning with transformers to improve transferability to new tasks, achieving state-of-the-art results on benchmarks.

Contribution

It reframes unsupervised meta-learning as a sequence modeling problem using transformers, enabling better generalization through diverse task creation strategies.

Findings

01

Outperforms existing unsupervised meta-learning baselines.

02

Achieves results comparable to supervised and self-supervised methods.

03

Demonstrates the effectiveness of in-context learning for meta-learning.

Abstract

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-context learning observed in transformer architectures. Our method reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images and utilize it to predict query images. At the core of our approach lies the creation of diverse tasks generated using a combination of data augmentations and a mixing strategy that challenges the model during training while fostering generalization to unseen tasks at test time. Experimental results on benchmark datasets showcase the superiority of our approach over existing unsupervised meta-learning baselines,…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 5

Strengths

1) The work proposes a novel pseudo-task creation mechanism to generate FSL tasks out of unsupervised dataset, providing a new (maybe efficient) way to learn from unlabeled data. 2) Experiments with CAML show that CAMeLU outperforms other unsupervised methods to achieve promising results.

Weaknesses

However, I have the following concerns: 1) The model in this work is the same as CAML, which shows limited novelty. 2) Though the task formulation is interesting, only CAML and its successor (e.g. CAMeLU) are fairly trained in the setting. Other unsupervised method cannot leverage the supervised pseudo-tasks as supervised methods do. Essentially, CAMeLU is a supervised method. The pseudo-task generation provides a way to bridge the unsupervised scenario and the supervised scenario. 3) Instead o

Reviewer 02Rating 6Confidence 2

Strengths

1. The paper is relatively straightforward to read and follow. 2. The performance boost compared to the baselines is significant on all benchmarks. 3. The code is provided and although I did not run it but it looks well documented and easy to run which will make reproducing the results straightforward.

Weaknesses

1. The model may become very large due to the added transformer model which computationally may become expensive and challenging to handle compared to methods are using only CNN architectures. 2. Comparison is limited to only a few methods, despite the fact meta learning have a rich literature. 3. Analytic experiments are limited and do provide much insight about the proposed method. For example, what are the circumstance and criteria under which the proposed method is more effective? What are

Reviewer 03Rating 6Confidence 5

Strengths

1. Overcomes the limitation in the UML field of relying solely on simple data augmentation for constructing training tasks, by proposing a novel sequence modeling-based task construction approach. 2. Effectively leverages the advantages of in-context learning in LLM. 3. Provides extensive experiments and a thorough hyperparameter tuning process, which comprehensively demonstrate the advantages of the proposed algorithm.

Weaknesses

1. Lacks an explanation of the benefits of combining transformer architecture with the proposed data construction method, and does not clarify whether other architectures could also be adapted. In short, there is insufficient discussion of the model's generalizability. 2. Lacks detailed descriptions of the fixed feature extractor f and the learned class encoder g.

Videos

Unsupervised Meta-Learning via In-Context Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Human Pose and Action Recognition