tLoRA: Efficient Multi-LoRA Training with Elastic Shared Super-Models
Kevin Li, Dibyadeep Saha, Avni Kanodia, Fan Lai

TL;DR
tLoRA is a novel framework that enables efficient multi-LoRA training by sharing a super-model and adaptively scheduling heterogeneous adapters, significantly improving throughput, completion time, and GPU utilization.
Contribution
tLoRA introduces an elastic shared super-model and a fused kernel with adaptive scheduling to optimize concurrent LoRA training of diverse jobs.
Findings
Training throughput increased by up to 1.8x
Job completion time reduced by up to 5.4x
GPU utilization improved by 37%
Abstract
As Low-Rank Adaptation (LoRA) becomes the standard approach for efficiently fine-tuning large language models (LLMs), shared clusters increasingly execute many concurrent LoRA training jobs over the same frozen backbone. While recent advances enable batching (co-locating) multiple adapters during serving, efficient training-time co-location of heterogeneous LoRA adapters presents unique challenges. Jobs often differ in adapter rank, batch size, and resource allocation, and na\"ive batching can introduce synchronization stalls, communication overheads, and per-job slowdowns that are worse than executing independently. We introduce tLoRA, a framework that enables efficient batch training of multiple LoRA jobs. tLoRA fuses adapters that share the same base model into an elastic shared super-model, exploiting existing distributed training frameworks to derive parallelism plans that share…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · IoT and Edge/Fog Computing
