BAdam: A Memory Efficient Full Parameter Optimization Method for Large   Language Models

Qijun Luo; Hengxu Yu; Xiao Li

arXiv:2404.02827·cs.LG·November 18, 2024·1 cites

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

Qijun Luo, Hengxu Yu, Xiao Li

PDF

Open Access 1 Repo 10 Models 1 Video

TL;DR

BAdam is a memory-efficient optimization method for large language models that combines block coordinate descent with Adam, enabling effective finetuning with reduced memory usage and competitive performance.

Contribution

It introduces BAdam, a novel BCD-based optimizer that improves memory efficiency and finetuning capability for large language models, with theoretical and empirical validation.

Findings

01

BAdam reduces memory usage significantly during finetuning.

02

BAdam achieves comparable or better performance than Adam and LoRA.

03

BAdam demonstrates efficient finetuning on large models with limited hardware.

Abstract

This work presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory efficient approach to the full parameter finetuning of large language models. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ledzy/badam
pytorchOfficial

Models

Videos

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLLaMA · Adam