PLoRA: Efficient LoRA Hyperparameter Tuning for Large Models
Minghao Yan, Zhuang Wang, Zhen Jia, Shivaram Venkataraman, Yida Wang

TL;DR
PLoRA is a novel system that optimizes hyperparameter tuning for LoRA fine-tuning of large language models, significantly reducing training time and increasing throughput by intelligently managing resources.
Contribution
We introduce PLoRA, which automatically orchestrates concurrent LoRA fine-tuning jobs and develops efficient kernels, enhancing training efficiency under hardware constraints.
Findings
PLoRA reduces fine-tuning makespan by up to 7.52x.
PLoRA improves training throughput by up to 12.8x.
Experimental results demonstrate significant efficiency gains across various LLMs.
Abstract
Low-rank Adaptation (LoRA) has gained popularity as a fine-tuning approach for Large Language Models (LLMs) due to its low resource requirements and good performance. While a plethora of work has investigated improving LoRA serving efficiency by serving multiple LoRAs concurrently, existing methods assume that a wide range of LoRA adapters are available for serving. In our work, we conduct extensive empirical studies to identify that current training paradigms do not utilize hardware resources efficiently and require high overhead to obtain a performant LoRA. Leveraging these insights, we propose PLoRA, which automatically orchestrates concurrent LoRA fine-tuning jobs under given hardware and model constraints and develops performant kernels to improve training efficiency. Our experimental studies show that PLoRA reduces the makespan of LoRA fine-tuning over a given hyperparameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
