Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Jinsong Zhang; Minghe Li; Jiayi Tian; Jinming Lu; Zheng Zhang

arXiv:2511.17971·cs.AR·November 26, 2025

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Jinsong Zhang, Minghe Li, Jiayi Tian, Jinming Lu, Zheng Zhang

PDF

Open Access

TL;DR

This paper introduces a co-exploration framework that jointly optimizes tensor contraction paths, hardware architecture, and dataflow mapping to improve the deployment efficiency of tensorized neural networks on edge devices, achieving significant latency reductions.

Contribution

It presents a unified design space and a latency-driven search method for optimizing tensorized neural network deployment on hardware, addressing the gap between algorithmic and hardware-aware design.

Findings

01

Achieves up to 4x lower inference latency

02

Achieves up to 3.85x lower training latency

03

Demonstrates effectiveness on FPGA hardware

Abstract

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while overlooking the hardware deployment efficiency. Such hardware-unaware designs often obscure the potential latency and energy benefits of tensorized models. Although several works attempt to reduce computational cost by optimizing the contraction sequence based on the number of multiply-accumulate operations, they typically neglect the underlying hardware characteristics, resulting in suboptimal real-world performance. We observe that the contraction path, hardware architecture, and dataflow mapping are tightly coupled and must be optimized jointly within a unified design space to maximize deployment efficiency on real devices. To this end, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLow-power high-performance VLSI design · Advanced Neural Network Applications · Model Reduction and Neural Networks