CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching
Wenzhang Du (Mahanakorn University of Technology, International College, Bangkok, Thailand)

TL;DR
This paper introduces a curvature-adaptive optimization method that uses low-rank Hessian sketching to improve training speed and efficiency in deep learning, achieving faster convergence without sacrificing accuracy.
Contribution
The paper proposes a novel curvature-adaptive optimizer that periodically sketches a low-rank Hessian, enabling faster convergence in training deep neural networks compared to standard optimizers.
Findings
Faster convergence to low-loss regions on CIFAR datasets
Performance insensitive to sketch rank k, with k=0 as a baseline
Achieves 2.95x speedup over Adam on CIFAR-100/ResNet-18
Abstract
First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that subspace, leaving the orthogonal complement first-order. For L-smooth non-convex objectives, we recover the standard O(1/T) stationarity guarantee with a widened stable stepsize range; under a Polyak--Lojasiewicz (PL) condition with bounded residual curvature outside the sketch, the loss contracts at refresh steps. On CIFAR-10/100 with ResNet-18/34, the method enters the low-loss region substantially earlier: measured by epochs to a pre-declared train-loss threshold (0.75), it reaches the threshold 2.95x faster than Adam on CIFAR-100/ResNet-18, while matching final test accuracy. The approach is one-knob: performance is insensitive to the sketch rank k…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · 3D Shape Modeling and Analysis
