A Hessian-informed hyperparameter optimization for differential learning rate

Shiyun Xu; Zhiqi Bu; Yiliang Zhang; Ian Barnett

arXiv:2501.06954·cs.LG·May 20, 2025

A Hessian-informed hyperparameter optimization for differential learning rate

Shiyun Xu, Zhiqi Bu, Yiliang Zhang, Ian Barnett

PDF

TL;DR

This paper introduces Hi-DLR, a Hessian-informed method for hyperparameter optimization of differential learning rates, improving convergence by adaptively capturing loss curvature during training.

Contribution

It proposes an efficient Hessian-informed approach for hyperparameter optimization of differential learning rates that adaptively captures loss curvature for any model and optimizer.

Findings

01

Hi-DLR improves convergence in deep learning models.

02

It effectively captures loss curvature for better hyperparameter tuning.

03

The method is applicable to various models and optimizers.

Abstract

Differential learning rate (DLR), a technique that applies different learning rates to different model parameters, has been widely used in deep learning and achieved empirical success via its various forms. For example, parameter-efficient fine-tuning (PEFT) applies zero learning rates to most parameters so as to significantly save the computational cost. At the core, DLR leverages the observation that different parameters can have different loss curvature, which is hard to characterize in general. We propose the Hessian-informed differential learning rate (Hi-DLR), an efficient approach that solves the hyperparameter optimization (HPO) of learning rates and captures the loss curvature for any model and optimizer adaptively. Given a proper grouping of parameters, we empirically demonstrate that Hi-DLR can improve the convergence by dynamically determining the learning rates during the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.