Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models
Mohammad Saleh Vahdatpour, Yanqing Zhang

TL;DR
This paper reviews energy-efficient software-hardware co-design strategies for machine learning across diverse scales, emphasizing architectural innovations and system techniques to reduce energy consumption and improve sustainability.
Contribution
It provides a comprehensive overview of co-design methods from TinyML to large language models, highlighting common trade-offs, gaps, and a hierarchical framework for optimization.
Findings
Identifies key design levers and trade-offs in energy-efficient ML systems.
Highlights gaps such as limited cross-platform generalization and costly search spaces.
Proposes a hierarchical decomposition approach for incremental optimization.
Abstract
The rapid deployment of machine learning across platforms from milliwatt-class TinyML devices to large language models has made energy efficiency a primary constraint for sustainable AI. Across these scales, performance and energy are increasingly limited by data movement and memory-system behavior rather than by arithmetic throughput alone. This work reviews energy efficient software hardware codesign methods spanning edge inference and training to datacenter-scale LLM serving, covering accelerator architectures (e.g., ASIC/FPGA dataflows, processing-/compute-in-memory designs) and system-level techniques (e.g., partitioning, quantization, scheduling, and runtime adaptation). We distill common design levers and trade-offs, and highlight recurring gaps including limited cross-platform generalization, large and costly co-design search spaces, and inconsistent benchmarking across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
