Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Zi Yang; Junnan Shan; Zheng Zhang

arXiv:2209.04003·math.OC·September 12, 2022

Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Zi Yang, Junnan Shan, Zheng Zhang

PDF

Open Access

TL;DR

This paper introduces a mixed-precision tensor decomposition method that significantly reduces memory and computational costs on modern hardware, validated through theoretical analysis and FPGA-based experiments.

Contribution

It proposes a novel mixed-precision block stochastic gradient descent approach for CP tensor decomposition with proven convergence and practical efficiency improvements.

Findings

01

Achieves faster convergence with mixed-precision SGD.

02

Reduces memory and energy consumption on resource-limited devices.

03

Demonstrates superior efficiency over full-precision methods.

Abstract

Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor processing units (TPU) and Tensor Core GPU has opened a new window of hardware-efficient computing via mixed- or low-precision arithmetic representations. In this paper, we exploit the low-precision representation of tensor factorization, and propose a mixed-precision block stochastic gradient descent (SGD) method to reduce the costs of CP tensor decomposition. Our method achieves robust and fast convergence via a two-stage optimization, i.e., SignSGD followed by mixed-precision SGD. Detailed theoretical analysis is provided to prove the convergence of the proposed mixed-precision algorithm. Numerical experiments on both synthetic and realistic tensor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques