Exploiting a Zoo of Checkpoints for Unseen Tasks

Jiaji Huang; Qiang Qiu; Kenneth Church

arXiv:2111.03628·cs.AI·November 8, 2021

Exploiting a Zoo of Checkpoints for Unseen Tasks

Jiaji Huang, Qiang Qiu, Kenneth Church

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method to select representative model checkpoints using Gaussian processes and mutual information, enabling better generalization to unseen tasks in NLP and computer vision.

Contribution

It proposes a novel approach to identify checkpoint subsets that effectively cover the task space, improving transferability to new tasks.

Findings

01

Selected checkpoints outperform random choices on unseen tasks.

02

The method is effective across NLP and computer vision domains.

03

The approach leverages unlabeled data for model selection.

Abstract

There are so many models in the literature that it is difficult for practitioners to decide which combinations are likely to be effective for a new task. This paper attempts to address this question by capturing relationships among checkpoints published on the web. We model the space of tasks as a Gaussian process. The covariance can be estimated from checkpoints and unlabeled probing data. With the Gaussian process, we can identify representative checkpoints by a maximum mutual information criterion. This objective is submodular. A greedy method identifies representatives that are likely to "cover" the task space. These representatives generalize to new tasks with superior performance. Empirical evidence is provided for applications from both computational linguistics as well as computer vision.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baidu-research/task_space
pytorchOfficial

Videos

Exploiting a Zoo of Checkpoints for Unseen Tasks· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications