A Study of Optimizations for Fine-tuning Large Language Models

Arjun Singh; Nikhil Pandey; Anup Shirgaonkar; Pavan Manoj; Vijay Aski

arXiv:2406.02290·cs.LG·June 7, 2024·1 cites

A Study of Optimizations for Fine-tuning Large Language Models

Arjun Singh, Nikhil Pandey, Anup Shirgaonkar, Pavan Manoj, Vijay Aski

PDF

Open Access

TL;DR

This paper provides a comprehensive analysis of various optimization techniques for fine-tuning large language models, focusing on memory efficiency and runtime performance, and offers practical recommendations for different resource scenarios.

Contribution

It systematically evaluates multiple fine-tuning optimizations and proposes effective strategies and combinations for large models under resource constraints.

Findings

01

Gradient Checkpointing reduces memory usage significantly.

02

Optimal optimization combinations balance memory and runtime effectively.

03

Strategies enable fine-tuning of models with hundreds of billions of parameters.

Abstract

Fine-tuning large language models is a popular choice among users trying to adapt them for specific applications. However, fine-tuning these models is a demanding task because the user has to examine several factors, such as resource budget, runtime, model size and context length among others. A specific challenge is that fine-tuning is memory intensive, imposing constraints on the required hardware memory and context length of training data that can be handled. In this work, we share a detailed study on a variety of fine-tuning optimizations across different fine-tuning scenarios. In particular, we assess Gradient Checkpointing, Low-Rank Adaptation, DeepSpeed's Zero Redundancy Optimizer and FlashAttention. With a focus on memory and runtime, we examine the impact of different optimization combinations on GPU memory usage and execution runtime during fine-tuning phase. We provide our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsGradient Checkpointing · Focus · ZeRO