SoupLM: Model Integration in Large Language and Multi-Modal Models

Yue Bai; Zichen Zhang; Jiasen Lu; Yun Fu

arXiv:2407.08196·cs.AI·July 12, 2024

SoupLM: Model Integration in Large Language and Multi-Modal Models

Yue Bai, Zichen Zhang, Jiasen Lu, Yun Fu

PDF

Open Access

TL;DR

This paper introduces SoupLM, a cost-efficient method to assemble large language and multimodal models from existing variants, enhancing capabilities without extensive retraining.

Contribution

The paper proposes a novel 'soup' strategy to combine different LLM variants into a single multimodal model, reducing training costs and leveraging diverse domain knowledge.

Findings

01

Effective model assembly with performance gains

02

Cost reduction in training large models

03

Insights into model interpolation behavior

Abstract

Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks. For instance, LLaMA, Vicuna, and LLaVA are three LLM variants trained with LLaMA base models using very different training recipes, tasks, and data modalities. The training cost and complexity for such LLM variants grow rapidly. In this study, we propose to use a soup strategy to assemble these LLM variants into a single well-generalized multimodal LLM (SoupLM) in a cost-efficient manner. Assembling these LLM variants efficiently brings knowledge and specialities trained from different domains and data modalities into an integrated one (e.g., chatbot speciality from user-shared conversations for Vicuna, and visual capacity from vision-language data for LLaVA),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsLLaMA · Balanced Selection