Kourkoutas-Beta: A Sunspike-Driven Adam Optimizer with Desert Flair

Stavros C. Kassinos

arXiv:2508.12996·cs.LG·August 22, 2025

Kourkoutas-Beta: A Sunspike-Driven Adam Optimizer with Desert Flair

Stavros C. Kassinos

PDF

TL;DR

Kourkoutas-Beta introduces a dynamic second-moment adjustment in Adam optimizer driven by gradient spike detection, enhancing stability and performance in physics-based neural network training without significant runtime overhead.

Contribution

It proposes a novel sunspike-driven adaptive beta2 mechanism for Adam, improving robustness and convergence in challenging neural network training scenarios.

Findings

01

Significantly reduces bits-per-character on enwik8 dataset.

02

Improves stability and final loss in physics-based neural network tasks.

03

Maintains Adam-style convergence guarantees.

Abstract

Transformer neural networks are increasingly used for physics-based problems. In data-driven PDE surrogates, training samples from varying boundary and initial conditions can cause erratic losses and spiky gradients; in physics-informed neural networks (PINNs), stiff composite losses amplify this effect. We introduce Kourkoutas-Beta, an Adam-style optimizer where the fixed second-moment discount beta2 is replaced by a layer-wise dynamic value driven by a bounded ``sunspike'' ratio: the current pooled gradient norm divided by an exponential moving average (EMA) of past norms, squashed to the interval [0,1). Spikes lower beta2 toward beta2_min; calm phases keep it near beta2_max. Options include leaky-AMSGrad (decay), trust-region clipping (max_ratio), adaptive tiny terms, and several bias-correction modes ``none'', ``beta2max'', ``exact'). With all features off and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.