Sparse Tucker Tensor Decomposition on a Hybrid FPGA-CPU Platform
Weiyun Jiang, Kaiqi Zhang, Colin Yu Lin, Feng Xing, and Zheng Zhang

TL;DR
This paper presents a hybrid FPGA-CPU platform that accelerates sparse Tucker tensor decomposition, significantly improving speed and energy efficiency for processing high-dimensional sparse data in various applications.
Contribution
It introduces a novel hybrid FPGA-CPU framework that accelerates key tensor operations, achieving substantial speedup and energy savings over CPU-only implementations.
Findings
Achieves up to 1091x speedup over CPU.
Provides over 93.5% energy savings.
Effectively accelerates tensor-times-matrix and Kronecker modules on FPGA.
Abstract
Recommendation systems, social network analysis, medical imaging, and data mining often involve processing sparse high-dimensional data. Such high-dimensional data are naturally represented as tensors, and they cannot be efficiently processed by conventional matrix or vector computations. Sparse Tucker decomposition is an important algorithm for compressing and analyzing these sparse high-dimensional data sets. When energy efficiency and data privacy are major concerns, hardware accelerators on resource-constraint platforms become crucial for the deployment of tensor algorithms. In this work, we propose a hybrid computing framework containing CPU and FPGA to accelerate sparse Tucker factorization. This algorithm has three main modules: tensor-times-matrix (TTM), Kronecker products, and QR decomposition with column pivoting (QRP). In addition, we accelerate the former two modules on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
