Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models

Mohammad Saleh Vahdatpour; Yanqing Zhang

arXiv:2603.23668·cs.AR·March 26, 2026

Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models

Mohammad Saleh Vahdatpour, Yanqing Zhang

PDF

Open Access

TL;DR

This paper reviews energy-efficient software-hardware co-design strategies for machine learning across diverse scales, emphasizing architectural innovations and system techniques to reduce energy consumption and improve sustainability.

Contribution

It provides a comprehensive overview of co-design methods from TinyML to large language models, highlighting common trade-offs, gaps, and a hierarchical framework for optimization.

Findings

01

Identifies key design levers and trade-offs in energy-efficient ML systems.

02

Highlights gaps such as limited cross-platform generalization and costly search spaces.

03

Proposes a hierarchical decomposition approach for incremental optimization.

Abstract

The rapid deployment of machine learning across platforms from milliwatt-class TinyML devices to large language models has made energy efficiency a primary constraint for sustainable AI. Across these scales, performance and energy are increasingly limited by data movement and memory-system behavior rather than by arithmetic throughput alone. This work reviews energy efficient software hardware codesign methods spanning edge inference and training to datacenter-scale LLM serving, covering accelerator architectures (e.g., ASIC/FPGA dataflows, processing-/compute-in-memory designs) and system-level techniques (e.g., partitioning, quantization, scheduling, and runtime adaptation). We distill common design levers and trade-offs, and highlight recurring gaps including limited cross-platform generalization, large and costly co-design search spaces, and inconsistent benchmarking across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy