HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
Huaqin Zhao, Jiaxi Li, Yi Pan, Shizhe Liang, Xiaofeng Yang, Wei Liu,, Xiang Li, Fei Dou, Tianming Liu, Jin Lu

TL;DR
HELENE is a novel optimizer that accelerates fine-tuning of large language models by combining second-order information with gradient annealing, leading to faster convergence and improved accuracy.
Contribution
It introduces HELENE, a scalable optimizer that integrates Hessian-based layer-wise clipping and gradient annealing, significantly enhancing convergence speed for large models.
Findings
Achieves up to 20x speedup over MeZO.
Improves average accuracy by 1.5%.
Compatible with full and parameter-efficient fine-tuning.
Abstract
Fine-tuning large language models (LLMs) poses significant memory challenges, as the back-propagation process demands extensive resources, especially with growing model sizes. Recent work, MeZO, addresses this issue using a zeroth-order (ZO) optimization method, which reduces memory consumption by matching the usage to the inference phase. However, MeZO experiences slow convergence due to varying curvatures across model parameters. To overcome this limitation, we introduce HELENE, a novel scalable and memory-efficient optimizer that integrates annealed A-GNB gradients with a diagonal Hessian estimation and layer-wise clipping, serving as a second-order pre-conditioner. This combination allows for faster and more stable convergence. Our theoretical analysis demonstrates that HELENE improves convergence rates, particularly for models with heterogeneous layer dimensions, by reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Photolithography Techniques · Advanced Surface Polishing Techniques · Iterative Learning Control Systems
