Understanding the Performance and Estimating the Cost of LLM Fine-Tuning
Yuchen Xia, Jiho Kim, Yuhan Chen, Haojie Ye, Souvik Kundu, Cong Hao, and Nishil Talati

TL;DR
This paper analyzes the performance and cost of fine-tuning Large Language Models using sparse Mixture of Experts, providing insights into efficiency, optimization, and an analytical cost estimation model for practitioners.
Contribution
It characterizes sparse MoE LLM fine-tuning performance and develops a cost estimation model based on GPU and model parameters.
Findings
Sparse MoE models achieve competitive accuracy with efficient GPU utilization.
Optimization of MoE layers significantly improves fine-tuning performance.
The analytical cost model accurately predicts training throughput and expenses.
Abstract
Due to the cost-prohibitive nature of training Large Language Models (LLMs), fine-tuning has emerged as an attractive alternative for specializing LLMs for specific tasks using limited compute resources in a cost-effective manner. In this paper, we characterize sparse Mixture of Experts (MoE) based LLM fine-tuning to understand their accuracy and runtime performance on a single GPU. Our evaluation provides unique insights into the training efficacy of sparse and dense versions of MoE models, as well as their runtime characteristics, including maximum batch size, execution time breakdown, end-to-end throughput, GPU hardware utilization, and load distribution. Our study identifies the optimization of the MoE layer as crucial for further improving the performance of LLM fine-tuning. Using our profiling results, we also develop and validate an analytical model to estimate the cost of LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems · Distributed and Parallel Computing Systems · Simulation Techniques and Applications
MethodsMixture of Experts
