MoExtend: Tuning New Experts for Modality and Task Extension

Shanshan Zhong; Shanghua Gao; Zhongzhan Huang; Wushao Wen; Marinka; Zitnik; Pan Zhou

arXiv:2408.03511·cs.CV·August 8, 2024

MoExtend: Tuning New Experts for Modality and Task Extension

Shanshan Zhong, Shanghua Gao, Zhongzhan Huang, Wushao Wen, Marinka, Zitnik, Pan Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

MoExtend introduces a framework that efficiently extends large language models with new modalities and tasks by integrating new experts into pre-trained MoE models, avoiding full fine-tuning and catastrophic forgetting.

Contribution

It presents a novel method for modality and task extension in MoE models that requires no tuning of pretrained models, enabling rapid and effective multimodal adaptation.

Findings

01

MoExtend effectively enhances multimodal capabilities of LLMs.

02

The approach reduces training costs and mitigates catastrophic forgetting.

03

Experimental results show improved performance in multimodal tasks.

Abstract

Large language models (LLMs) excel in various tasks but are primarily trained on text data, limiting their application scope. Expanding LLM capabilities to include vision-language understanding is vital, yet training them on multimodal data from scratch is challenging and costly. Existing instruction tuning methods, e.g., LLAVA, often connects a pretrained CLIP vision encoder and LLMs via fully fine-tuning LLMs to bridge the modality gap. However, full fine-tuning is plagued by catastrophic forgetting, i.e., forgetting previous knowledge, and high training costs particularly in the era of increasing tasks and modalities. To solve this issue, we introduce MoExtend, an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhongshsh/moextend
pytorchOfficial

Videos

MoExtend: Tuning New Experts for Modality and Task Extension· underline

Taxonomy

TopicsSemantic Web and Ontologies

MethodsContrastive Language-Image Pre-training · Mixture of Experts