Accelerating AI Performance using Anderson Extrapolation on GPUs

Saleem Abdul Fattah Ahmed Al Dajani; David E. Keyes

arXiv:2410.19460·cs.LG·December 20, 2024

Accelerating AI Performance using Anderson Extrapolation on GPUs

Saleem Abdul Fattah Ahmed Al Dajani, David E. Keyes

PDF

Open Access

TL;DR

This paper introduces Anderson extrapolation to accelerate AI training and inference on GPUs, reducing iterations and improving scalability while maintaining accuracy and stability.

Contribution

It presents a novel extrapolation technique that optimizes GPU-based AI performance by balancing iteration count, speed, memory, and stability.

Findings

01

Significant speedups in training and inference.

02

Effective reduction in iteration count for convergence.

03

Enhanced scalability in high-performance computing environments.

Abstract

We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation, a vector-to-vector mapping technique based on a window of historical iterations. By identifying the crossover point (Fig. 1) where a mixing penalty is incurred, the method focuses on reducing iterations to convergence, with fewer more compute-intensive but generally cacheable iterations, balancing speed and memory usage with accuracy and algorithmic stability, respectively. We demonstrate significant improvements, in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing (HPC).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeophysical Methods and Applications · Microwave Imaging and Scattering Analysis · Soil Moisture and Remote Sensing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings