Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving
Yuchen Zhang, Hanyue Du, Chun Cao, Jingwei Xu

TL;DR
Loquetier is a unified framework that seamlessly integrates LoRA fine-tuning and inference for large language models, significantly improving efficiency and throughput through virtualization and optimized computation flow.
Contribution
It introduces a virtualized multi-LoRA framework with a shared base model, enabling unified fine-tuning and inference with high efficiency and flexibility.
Findings
Up to 3.0× throughput improvement over state-of-the-art systems
46.4× higher SLO attainment in unified tasks
Effective integration of fine-tuning and inference paths
Abstract
Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning (PEFT) technique for adapting large language models (LLMs) to downstream tasks. While prior work has explored strategies for integrating LLM training and serving, there still remains a gap in unifying fine-tuning and inference for LoRA-based models. We present Loquetier, a virtualized multi-LoRA framework that seamlessly integrates LoRA fine-tuning and serving within a single runtime. Loquetier introduces two key components: (1) a Virtualized Module that isolates PEFT-based modifications and supports multiple adapters on a shared base model, and (2) an optimized computation flow with a kernel design that merges fine-tuning and inference paths in forward propagation, enabling efficient batching and minimizing kernel invocation overhead. Extensive experiments across three task settings show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification
