Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

Yuchen Zhang; Hanyue Du; Chun Cao; Jingwei Xu

arXiv:2511.00101·cs.LG·November 4, 2025

Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

Yuchen Zhang, Hanyue Du, Chun Cao, Jingwei Xu

PDF

Open Access 1 Video

TL;DR

Loquetier is a unified framework that seamlessly integrates LoRA fine-tuning and inference for large language models, significantly improving efficiency and throughput through virtualization and optimized computation flow.

Contribution

It introduces a virtualized multi-LoRA framework with a shared base model, enabling unified fine-tuning and inference with high efficiency and flexibility.

Findings

01

Up to 3.0× throughput improvement over state-of-the-art systems

02

46.4× higher SLO attainment in unified tasks

03

Effective integration of fine-tuning and inference paths

Abstract

Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning (PEFT) technique for adapting large language models (LLMs) to downstream tasks. While prior work has explored strategies for integrating LLM training and serving, there still remains a gap in unifying fine-tuning and inference for LoRA-based models. We present Loquetier, a virtualized multi-LoRA framework that seamlessly integrates LoRA fine-tuning and serving within a single runtime. Loquetier introduces two key components: (1) a Virtualized Module that isolates PEFT-based modifications and supports multiple adapters on a shared base model, and (2) an optimized computation flow with a kernel design that merges fine-tuning and inference paths in forward propagation, enabling efficient batching and minimizing kernel invocation overhead. Extensive experiments across three task settings show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification