BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
Qijun Luo, Hengxu Yu, Xiao Li

TL;DR
BAdam is a memory-efficient optimization method for large language models that combines block coordinate descent with Adam, enabling effective finetuning with reduced memory usage and competitive performance.
Contribution
It introduces BAdam, a novel BCD-based optimizer that improves memory efficiency and finetuning capability for large language models, with theoretical and empirical validation.
Findings
BAdam reduces memory usage significantly during finetuning.
BAdam achieves comparable or better performance than Adam and LoRA.
BAdam demonstrates efficient finetuning on large models with limited hardware.
Abstract
This work presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory efficient approach to the full parameter finetuning of large language models. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗trollek/danube2-1.8b-SlimOrcaDedupmodel· 3 dl3 dl
- 🤗trollek/danube2-1.8b-Neuralmodel· 5 dl5 dl
- 🤗trollek/danube2-1.8b-airoboros-3.2model· 3 dl3 dl
- 🤗trollek/danube2-1.8b-openhermesmodel· 2 dl2 dl
- 🤗trollek/danube2-1.8b-WizardLM-Evol-V2-Unfilteredmodel· 5 dl5 dl
- 🤗trollek/danube2-1.8b-SystemChat-1.1model· 4 dl4 dl
- 🤗trollek/danube2-1.8b-glaive-function-calling-v2model· 1 dl1 dl
- 🤗trollek/danube2-1.8b-CodeFeedbackmodel· 3 dl3 dl
- 🤗sunatte/txt2sqlmodel
- 🤗trollek/danube2-1.8b-Tess-v1.5model· 4 dl4 dl
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLLaMA · Adam
