LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via   Efficient Zeroth-Order Adaptive SAM

Yehonathan Refael; Iftach Arbel; Ofir Lindenbaum; Tom Tirer

arXiv:2502.19571·cs.LG·February 28, 2025

LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM

Yehonathan Refael, Iftach Arbel, Ofir Lindenbaum, Tom Tirer

PDF

Open Access

TL;DR

This paper introduces LORENZA, a low-rank gradient optimization method, and AdaZo-SAM, a zeroth-order adaptive SAM framework, both designed to improve generalization and efficiency in large-language model training and fine-tuning.

Contribution

The paper proposes LORENZA and AdaZo-SAM, novel methods that enhance generalization and reduce memory usage in LLM training through low-rank and zeroth-order optimization techniques.

Findings

01

LORENZA achieves full-parameter fine-tuning with low memory consumption.

02

AdaZo-SAM improves generalization with single-gradient iteration and stochastic zeroth-order estimation.

03

Both methods are theoretically analyzed and empirically validated on LLM tasks.

Abstract

We study robust parameter-efficient fine-tuning (PEFT) techniques designed to improve accuracy and generalization while operating within strict computational and memory hardware constraints, specifically focusing on large-language models (LLMs). Existing PEFT methods often lack robustness and fail to generalize effectively across diverse tasks, leading to suboptimal performance in real-world scenarios. To address this, we present a new highly computationally efficient framework called AdaZo-SAM, combining Adam and Sharpness-Aware Minimization (SAM) while requiring only a single-gradient computation in every iteration. This is achieved using a stochastic zeroth-order estimation to find SAM's ascent perturbation. We provide a convergence guarantee for AdaZo-SAM and show that it improves the generalization ability of state-of-the-art PEFT methods. Additionally, we design a low-rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications

MethodsAdam · Sharpness-Aware Minimization