# Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

**Authors:** Tatyana Matveeva, Aleksandr Katrutsa, Evgeny Frolov

arXiv: 2508.21106 · 2025-09-01

## TL;DR

AdaGram introduces an efficient full-matrix adaptive optimizer that captures parameter correlations with low-rank approximations, enabling faster convergence in large-scale models while reducing computational costs.

## Contribution

The paper presents AdaGram, a novel optimizer that efficiently approximates full-matrix preconditioning using low-rank structures and fast symmetric factorization.

## Key findings

- AdaGram converges faster than diagonal adaptive methods on standard tasks.
- It matches the performance of full-matrix methods with low-rank approximations.
- Demonstrates scalability for large models with reduced computational overhead.

## Abstract

Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods, approximating the exact Hessian, can model these correlations and may enable faster convergence. At the same time, their computational and memory costs are often prohibitive for large-scale models. To address this limitation, we propose AdaGram, an optimizer that enables efficient full-matrix adaptive gradient updates. To reduce memory and computational overhead, we utilize fast symmetric factorization for computing the preconditioned update direction at each iteration. Additionally, we maintain the low-rank structure of a preconditioner along the optimization trajectory using matrix integrator methods. Numerical experiments on standard machine learning tasks show that AdaGram converges faster or matches the performance of diagonal adaptive optimizers when using rank five and smaller rank approximations. This demonstrates AdaGram's potential as a scalable solution for adaptive optimization in large models.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21106/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21106/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/2508.21106/full.md

---
Source: https://tomesphere.com/paper/2508.21106