Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures

Yashasvi Makin; Rahul Maliakkal

arXiv:2508.13163·cs.AR·August 20, 2025

Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures

Yashasvi Makin, Rahul Maliakkal

PDF

TL;DR

This paper investigates hardware-software co-design techniques across NVIDIA, AMD, and emerging GPU architectures to significantly improve energy efficiency in AI training, addressing sustainability challenges in large-scale deep learning.

Contribution

It introduces novel hardware-software co-design strategies tailored for advanced GPU architectures to enhance energy efficiency in AI training.

Findings

01

Energy efficiency increases through specialized tensor and matrix cores

02

Advanced memory optimization methods improve performance-per-watt

03

Real-world case studies demonstrate practical sustainability improvements

Abstract

In particular, large-scale deep learning and artificial intelligence model training uses a lot of computational power and energy, so it poses serious sustainability issues. The fast rise in model complexity has resulted in exponential increases in energy consumption, increasing the demand for techniques maximizing computational efficiency and lowering environmental impact. This work explores environmentally driven performance optimization methods especially intended for advanced GPU architectures from NVIDIA, AMD, and other emerging GPU architectures. Our main focus is on investigating hardware-software co-design techniques meant to significantly increase memory-level and kernel-level operations, so improving performance-per-watt measures. Our thorough research encompasses evaluations of specialized tensor and matrix cores, advanced memory optimization methods, and creative integration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.