Learning to Route Among Specialized Experts for Zero-Shot Generalization
Mohammed Muqeeth, Haokun Liu, Yufan Liu, Colin Raffel

TL;DR
This paper introduces PHATGOOSE, a post-hoc routing method that adaptively selects specialized experts at each token and layer, significantly improving zero-shot generalization without needing access to training data of the experts.
Contribution
PHATGOOSE is a novel post-hoc routing approach that enhances zero-shot generalization by adaptively selecting experts per token and layer, outperforming previous methods.
Findings
PHATGOOSE outperforms past post-hoc routing methods.
In some cases, PHATGOOSE surpasses explicit multitask training.
The method effectively makes adaptive per-token and per-module expert choices.
Abstract
Recently, there has been a widespread proliferation of "expert" language models that are specialized to a specific task or domain through parameter-efficient fine-tuning. How can we recycle large collections of expert language models to improve zero-shot generalization to unseen tasks? In this work, we propose Post-Hoc Adaptive Tokenwise Gating Over an Ocean of Specialized Experts (PHATGOOSE), which learns to route among specialized modules that were produced through parameter-efficient fine-tuning. Unlike past methods that learn to route among specialized models, PHATGOOSE explores the possibility that zero-shot generalization will be improved if different experts can be adaptively chosen for each token and at each layer in the model. Crucially, our method is post-hoc - it does not require simultaneous access to the datasets used to create the specialized models and only requires a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
