Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Yishun Lu; Junhao Zhang; Zeyu Yang; and Wes Armour

arXiv:2605.16184·cs.DC·May 18, 2026

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Yishun Lu, Junhao Zhang, Zeyu Yang, and Wes Armour

PDF

TL;DR

Asteria is a runtime system that enables scalable second-order optimization for large language model training by efficiently managing optimizer state across hardware and asynchronous computations.

Contribution

It introduces a novel runtime approach that separates second-order optimization logic from GPU training, enabling practical large-scale second-order LLM training.

Findings

01

Supports second-order training on a 1B-parameter model with limited GPU memory.

02

Reduces optimizer overhead and latency spikes on multi-node systems.

03

Accelerates convergence and maintains optimization benefits in large models.

Abstract

Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by separating second-order optimization logic from the critical GPU training path. Rather than keeping all preconditioner state on the accelerator, Asteria dynamically distributes optimizer state across GPU memory, CPU memory, and optional NVMe storage according to architectural constraints and runtime pressure. It further uses training hooks to prepare shadow states in advance, allowing expensive inverse-root computations to proceed asynchronously on the host while GPU computation continues. For distributed training, Asteria employs a bounded-staleness protocol that limits synchronization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.