Learning to Route Among Specialized Experts for Zero-Shot Generalization

Mohammed Muqeeth; Haokun Liu; Yufan Liu; Colin Raffel

arXiv:2402.05859·cs.LG·June 24, 2024·1 cites

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Mohammed Muqeeth, Haokun Liu, Yufan Liu, Colin Raffel

PDF

Open Access 3 Repos

TL;DR

This paper introduces PHATGOOSE, a post-hoc routing method that adaptively selects specialized experts at each token and layer, significantly improving zero-shot generalization without needing access to training data of the experts.

Contribution

PHATGOOSE is a novel post-hoc routing approach that enhances zero-shot generalization by adaptively selecting experts per token and layer, outperforming previous methods.

Findings

01

PHATGOOSE outperforms past post-hoc routing methods.

02

In some cases, PHATGOOSE surpasses explicit multitask training.

03

The method effectively makes adaptive per-token and per-module expert choices.

Abstract

Recently, there has been a widespread proliferation of "expert" language models that are specialized to a specific task or domain through parameter-efficient fine-tuning. How can we recycle large collections of expert language models to improve zero-shot generalization to unseen tasks? In this work, we propose Post-Hoc Adaptive Tokenwise Gating Over an Ocean of Specialized Experts (PHATGOOSE), which learns to route among specialized modules that were produced through parameter-efficient fine-tuning. Unlike past methods that learn to route among specialized models, PHATGOOSE explores the possibility that zero-shot generalization will be improved if different experts can be adaptively chosen for each token and at each layer in the model. Crucially, our method is post-hoc - it does not require simultaneous access to the datasets used to create the specialized models and only requires a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment