Symbiosis: Multi-Adapter Inference and Fine-Tuning
Saransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman

TL;DR
Symbiosis introduces a flexible, resource-efficient framework for multi-adapter inference and fine-tuning of large language models, enabling shared base models, independent resource management, and privacy preservation.
Contribution
It proposes a split-execution technique that decouples adapter and base model execution, supporting multiple adapters simultaneously with improved resource utilization.
Findings
Successfully fine-tuned 20 adapters on 8 GPUs.
Supports multiple adapters with shared base models.
Enhances resource management and privacy in LLM fine-tuning.
Abstract
Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task-specific parameters into adapters, which are a fraction of the size of the original base model. Popularity of PEFT technique for fine-tuning has led to the creation of a large number of adapters for popular Large Language Models (LLMs). However, existing frameworks fall short in supporting inference or fine-tuning with multiple adapters in the following ways. 1) For fine-tuning, each job needs to deploy its dedicated base model instance, which results in excessive GPU memory consumption and poor GPU utilization. 2) While popular inference platforms can serve multiple PEFT adapters, they do not allow independent resource management or mixing of different PEFT methods. 3) They cannot make effective use of heterogeneous accelerators. 4) They do not provide privacy to users who may not wish to expose their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Game Theory and Cooperation · Evolution and Genetic Dynamics
