Mixture of Experts Meets Prompt-Based Continual Learning

Minh Le; An Nguyen; Huy Nguyen; Trang Nguyen; Trang Pham; Linh Van; Ngo; Nhat Ho

arXiv:2405.14124·cs.LG·January 7, 2025·1 cites

Mixture of Experts Meets Prompt-Based Continual Learning

Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van, Ngo, Nhat Ho

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals that the attention mechanism in pre-trained models functions as a mixture of experts, leading to a novel gating method called NoRGa that improves prompt-based continual learning.

Contribution

It provides a theoretical understanding of prompt effectiveness, introduces a new gating mechanism, and demonstrates improved continual learning performance.

Findings

01

Attention blocks encode mixture of experts architecture.

02

NoRGa improves continual learning performance.

03

Theoretical and empirical validation across benchmarks.

Abstract

Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer. While existing prompt-based continual learning methods excel in leveraging prompts for state-of-the-art performance, they often lack a theoretical explanation for the effectiveness of prompting. This paper conducts a theoretical analysis to unravel how prompts bestow such advantages in continual learning, thus offering a new perspective on prompt design. We first show that the attention block of pre-trained models like Vision Transformers inherently encodes a special mixture of experts architecture, characterized by linear experts and quadratic gating score functions. This realization drives us to provide a novel view on prefix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minhchuyentoancbn/moe_promptcl
pytorchOfficial

Videos

Mixture of Experts Meets Prompt-Based Continual Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning