Scalable Thermodynamic Second-order Optimization

Kaelan Donatella; Samuel Duffield; Denis Melanson; Maxwell Aifer,; Phoebe Klett; Rajath Salegame; Zach Belateche; Gavin Crooks; Antonio J.; Martinez; Patrick J. Coles

arXiv:2502.08603·cs.ET·February 13, 2025

Scalable Thermodynamic Second-order Optimization

Kaelan Donatella, Samuel Duffield, Denis Melanson, Maxwell Aifer,, Phoebe Klett, Rajath Salegame, Zach Belateche, Gavin Crooks, Antonio J., Martinez, Patrick J. Coles

PDF

Open Access

TL;DR

This paper introduces a scalable thermodynamic hardware-based algorithm to accelerate second-order optimization in AI training, promising significant speedups especially for large neural networks and complex problems.

Contribution

It proposes a novel thermodynamic hardware algorithm for second-order optimization, specifically K-FAC, with analysis and experiments demonstrating its efficiency and robustness.

Findings

01

Asymptotic advantage increases with network size

02

Second-order optimization benefits are preserved under quantization noise

03

Predicted substantial speedups for large-scale vision and graph tasks

Abstract

Many hardware proposals have aimed to accelerate inference in AI workloads. Less attention has been paid to hardware acceleration of training, despite the enormous societal impact of rapid training of AI models. Physics-based computers, such as thermodynamic computers, offer an efficient means to solve key primitives in AI training algorithms. Optimizers that normally would be computationally out-of-reach (e.g., due to expensive matrix inversions) on digital hardware could be unlocked with physics-based hardware. In this work, we propose a scalable algorithm for employing thermodynamic computers to accelerate a popular second-order optimizer called Kronecker-factored approximate curvature (K-FAC). Our asymptotic complexity analysis predicts increasing advantage with our algorithm as $n$ , the number of neurons per layer, increases. Numerical experiments show that even under significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Machine Learning in Materials Science

MethodsSoftmax · Attention Is All You Need