Understanding the Performance and Estimating the Cost of LLM Fine-Tuning

Yuchen Xia; Jiho Kim; Yuhan Chen; Haojie Ye; Souvik Kundu; Cong Hao; and Nishil Talati

arXiv:2408.04693·cs.CL·August 15, 2024·2 cites

Understanding the Performance and Estimating the Cost of LLM Fine-Tuning

Yuchen Xia, Jiho Kim, Yuhan Chen, Haojie Ye, Souvik Kundu, Cong Hao, and Nishil Talati

PDF

Open Access 1 Repo 7 Models

TL;DR

This paper analyzes the performance and cost of fine-tuning Large Language Models using sparse Mixture of Experts, providing insights into efficiency, optimization, and an analytical cost estimation model for practitioners.

Contribution

It characterizes sparse MoE LLM fine-tuning performance and develops a cost estimation model based on GPU and model parameters.

Findings

01

Sparse MoE models achieve competitive accuracy with efficient GPU utilization.

02

Optimization of MoE layers significantly improves fine-tuning performance.

03

The analytical cost model accurately predicts training throughput and expenses.

Abstract

Due to the cost-prohibitive nature of training Large Language Models (LLMs), fine-tuning has emerged as an attractive alternative for specializing LLMs for specific tasks using limited compute resources in a cost-effective manner. In this paper, we characterize sparse Mixture of Experts (MoE) based LLM fine-tuning to understand their accuracy and runtime performance on a single GPU. Our evaluation provides unique insights into the training efficacy of sparse and dense versions of MoE models, as well as their runtime characteristics, including maximum batch size, execution time breakdown, end-to-end throughput, GPU hardware utilization, and load distribution. Our study identifies the optimization of the MoE layer as crucial for further improving the performance of LLM fine-tuning. Using our profiling results, we also develop and validate an analytical model to estimate the cost of LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stsxxx/finetune
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Learning Control Systems · Distributed and Parallel Computing Systems · Simulation Techniques and Applications

MethodsMixture of Experts