Hardware Acceleration for Neural Networks: A Comprehensive Survey
Bin Xu, Ayan Banerjee, Sandeep Gupta

TL;DR
This survey comprehensively reviews hardware acceleration techniques for neural networks, covering architectures, optimization strategies, and open challenges across various workloads and deployment settings.
Contribution
It provides a unified taxonomy and synthesis of current hardware architectures, software tools, and emerging trends in neural network acceleration.
Findings
Highlights key architectural innovations like systolic arrays and specialized kernels.
Identifies open challenges such as efficient long-context LLM inference and sparse workload support.
Discusses future directions including energy efficiency and fair benchmarking.
Abstract
Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the current technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures, domain-specific accelerators (TPUs, NPUs), FPGA-based designs, ASIC inference engines, and emerging LLM-serving accelerators such as LPUs, alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the survey using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, Transformers/LLMs), (ii) execution settings (training vs.\ inference; datacenter vs.\ edge), and (iii) optimization levers (reduced precision,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques
