Second-Order, First-Class: A Composable Stack for Curvature-Aware Training

Mikalai Korbit; Mario Zanon

arXiv:2603.25976·cs.LG·March 30, 2026

Second-Order, First-Class: A Composable Stack for Curvature-Aware Training

Mikalai Korbit, Mario Zanon

PDF

TL;DR

Somax is a modular, composable stack for curvature-aware training that simplifies implementation, improves efficiency, and enhances flexibility in second-order optimization methods.

Contribution

It introduces a static planning approach and first-class modules for curvature-aware training integrated with Optax, enabling explicit, swappable components and reduced overhead.

Findings

01

Composition choices impact scaling and accuracy.

02

Planning reduces per-step overhead.

03

System ablations show benefits of modular design.

Abstract

Second-order methods promise improved stability and faster convergence, yet they remain underused due to implementation overhead, tuning brittleness, and the lack of composable APIs. We introduce Somax, a composable Optax-native stack that treats curvature-aware training as a single JIT-compiled step governed by a static plan. Somax exposes first-class modules -- curvature operators, estimators, linear solvers, preconditioners, and damping policies -- behind a single step interface and composes with Optax by applying standard gradient transformations (e.g., momentum, weight decay, schedules) to the computed direction. This design makes typically hidden choices explicit and swappable. Somax separates planning from execution: it derives a static plan (including cadences) from module requirements, then runs the step through a specialized execution path that reuses intermediate results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.