Hardware-Efficient Mixed-Precision CP Tensor Decomposition
Zi Yang, Junnan Shan, Zheng Zhang

TL;DR
This paper introduces a mixed-precision tensor decomposition method that significantly reduces memory and computational costs on modern hardware, validated through theoretical analysis and FPGA-based experiments.
Contribution
It proposes a novel mixed-precision block stochastic gradient descent approach for CP tensor decomposition with proven convergence and practical efficiency improvements.
Findings
Achieves faster convergence with mixed-precision SGD.
Reduces memory and energy consumption on resource-limited devices.
Demonstrates superior efficiency over full-precision methods.
Abstract
Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor processing units (TPU) and Tensor Core GPU has opened a new window of hardware-efficient computing via mixed- or low-precision arithmetic representations. In this paper, we exploit the low-precision representation of tensor factorization, and propose a mixed-precision block stochastic gradient descent (SGD) method to reduce the costs of CP tensor decomposition. Our method achieves robust and fast convergence via a two-stage optimization, i.e., SignSGD followed by mixed-precision SGD. Detailed theoretical analysis is provided to prove the convergence of the proposed mixed-precision algorithm. Numerical experiments on both synthetic and realistic tensor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
