ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang

TL;DR
ALTO is a system that accelerates hyperparameter tuning for LoRA in large language models, improving efficiency and GPU utilization in multi-task environments.
Contribution
ALTO introduces a co-designed training system that optimizes LoRA tuning and cluster sharing, leveraging loss monitoring and novel parallelism techniques.
Findings
Achieves up to 13.8× speedup over existing methods
Effectively terminates unpromising configurations early
Improves GPU utilization in heterogeneous multi-task settings
Abstract
Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA jobs, often spanning heterogeneous tasks in multi-tenant environments. Existing systems largely handle these jobs independently, which both wastes computation on weak candidates and leaves GPUs underutilized. We present ALTO (Adaptive LoRA Tuning and Orchestration), a co-designed training system that accelerates LoRA hyperparameter tuning while enabling efficient cluster sharing across heterogeneous tasks. The central insight behind ALTO is that when multiple tuning jobs run concurrently over a shared frozen backbone, they expose optimization opportunities that single-job…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
