Hardware Acceleration for Neural Networks: A Comprehensive Survey

Bin Xu; Ayan Banerjee; Sandeep Gupta

arXiv:2512.23914·eess.SY·January 16, 2026

Hardware Acceleration for Neural Networks: A Comprehensive Survey

Bin Xu, Ayan Banerjee, Sandeep Gupta

PDF

Open Access

TL;DR

This survey comprehensively reviews hardware acceleration techniques for neural networks, covering architectures, optimization strategies, and open challenges across various workloads and deployment settings.

Contribution

It provides a unified taxonomy and synthesis of current hardware architectures, software tools, and emerging trends in neural network acceleration.

Findings

01

Highlights key architectural innovations like systolic arrays and specialized kernels.

02

Identifies open challenges such as efficient long-context LLM inference and sparse workload support.

03

Discusses future directions including energy efficiency and fair benchmarking.

Abstract

Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the current technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures, domain-specific accelerators (TPUs, NPUs), FPGA-based designs, ASIC inference engines, and emerging LLM-serving accelerators such as LPUs, alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the survey using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, Transformers/LLMs), (ii) execution settings (training vs.\ inference; datacenter vs.\ edge), and (iii) optimization levers (reduced precision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques