Budget-aware Auto Optimizer Configurator

Kang Liu; Wei Peng; Jianchen Hu

arXiv:2605.04711·cs.AI·May 7, 2026

Budget-aware Auto Optimizer Configurator

Kang Liu, Wei Peng, Jianchen Hu

PDF

1 Repo

TL;DR

BAOC is a method that allocates optimizer configurations to neural network blocks based on statistical analysis, reducing memory usage during training while maintaining performance.

Contribution

It introduces a budget-aware approach that assigns different optimizer configurations to network blocks, optimizing memory efficiency without sacrificing training quality.

Findings

01

BAOC reduces optimizer memory by up to 50% in experiments.

02

It maintains comparable training performance with significantly lower memory costs.

03

The approach is effective across vision, language, and diffusion models.

Abstract

Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given budgets. Specifically, BAOC samples gradient streams to derive statistical metrics that quantify the potential performance risk of applying cheaper configurations (e.g., low precision or removing momentum). It then solves a constrained allocation problem to minimize total risk under memory and time budgets, selecting a budget-feasible configuration for each block. Experiments across vision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/BAOC-45C6
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.